> Toddy Prawiraharjo wrote: > > > > From: Rob Dixon [mailto:[EMAIL PROTECTED] > > Sent: Friday, 29 September 2006 10:04 AM > > To: beginners@perl.org > > Subject: Re: Querying very big database > >> > >> Toddy Prawiraharjo wrote: > >> > >>> Hello all, > >>> > >>> I am not sure if i'm inside the correct mailing list, > but it should be > >>> simple for you guys. Recently i started a simple perl > program to catch all > >>> syslog from my netscreen firewall, and put them into a > database, so later > >>> can do some analysing on it (sort of AWStats). the > database itself is pretty > >>> big, with 600k entry (for 3 weeks of the running > firewall) worth 80 megs in > >>> mysql. My question is, during the analysing my perl > script giving very slow > >>> processing. I know the query to mysql itself is pretty > quick, less than 2 > >>> seconds to return about 40k to 60k result to perl to be analysed > >>> > >>> Here it goes: > >>> <snipped> > >>> $query = "SELECT src, rcvd FROM no_name WHERE > start_date >= > >>> '$fromdate' AND start_date <= '$todate';"; > >>> $sth = $dbh->prepare($query); > >>> $sth -> execute() || &errorhere("SQL Error: > $DBI::errstr"); > >>> print "$query<br>"; > >>> > >>> my %src = (); > >>> my ($totalsent, $totalrcvd, $srcchecked); > >>> > >>> while(@row = $sth->fetchrow_array){ > >>> $srcchecked = 0; > >>> while($source, $total = each (%src)){ > >>> if ($source eq "$row[0]"){ > >>> $srcchecked = 1; > >>> $src{$source} = > $src{$source} + $row[1]; > >>> } > >>> } > >>> if ($srcchecked != 1){ > >>> print "$row[0]<br>"; > >>> $src{$row[0]} = $row[1]; > >>> } > >>> $totalrcvd = $totalsent + $row[1]; > >>> #$totalrcvd = $totalrcvd + $row[2]; > >>> } > >>> </snipped> > >>> > >>> The while loop to do analysing on the data take more > than 15 minutes, and > >>> that only to a query for 1 day long > records($fromdate-$todate) So, if I > >>> want perl to give me faster result, what's the solution? > Did I make > >>> fundamentally wrong approach? > >> > >> > >> You've missed the whole point of hashes, which is that > they will access > >> a data > >> value directly from a key. What you've written is similar to > >> > >> my $i = 0; > >> my $value; > >> foreach my $v (@array) { > >> if ($i++ == $n) { > >> $value = $v; > >> } > >> } > >> > >> instead of > >> > >> $value = $array[$n]; > >> > >> Try this while loop instead of the one you have and see > if you get an > >> improvement: > >> > >> while (@row = $sth->fetchrow_array) { > >> > >> my $key = $row[0]; > >> > >> print "$key<br>" unless exists $src{$key}; > >> $src{$key} += $row[1]; > >> > >> $totalsent += $row[1]; > >> #$totalrcvd += $row[2]; > >> } > >> > >> HTH, > >> > >> Rob > >> > > > > THX! I always knew it's my n00b scripting that caused the > problem. It > > now down to 4 lines inside while loop, and I'm flying! The > processing > > down from ~15 minutes to 4 secs! But, for longer date > range (2 weeks > > time limit) it clocked at 25 wallclock secs (with > processing ~350k mysql > > return entries). Any more way to process this beast any > faster? Any good > > reading or reference about this? Just afraid, if this rolled to > > production level, the report for months activities will > take minutes to > > make! > > > > <snipped> > > $query = "SELECT src, sent, rcvd FROM no_name > WHERE start_date > > >= '$fromdate' AND start_date <= '$todate';"; > > > > $sth = $dbh->prepare($query); > > $sth -> execute() || &errorhere("SQL Error: $DBI::errstr"); > > #print "$query<br>"; > > > > my %src = (); > > my ($totalsent, $totalrcvd); > > > > while(@row = $sth->fetchrow_array){ > > my $key = $row[0]; > > $src{$key} += $row[2]; > > $totalsent = $totalsent + $row[1]; > > $totalrcvd = $totalrcvd + $row[2]; > > } > > </snipped> > > Hi Toddy > > (Please bottom-post your replies. One day everybody in the > world will know to do > this and perl.beginners will be a happy, shiny place!) > > Let the database engine do it for you! > > my $query = qq( > SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd > FROM no_name > WHERE start_date >= '$fromdate' AND start_date <= '$todate' > GROUP BY src > ); > > my $sth = $dbh->prepare($query); > $sth->execute() || errorhere("SQL Error: $DBI::errstr"); > > my $src = $sth->fetchall_hashref('src'); > > foreach my $srcval(keys %$src) { > $totalsent += $src->{$srcval}{totalsent}; > $totalrcvd += $src->{$srcval}{totalrcvd}; > } > > > (I couldn't easily test this, although it is syntax checked, > so beware) > > Now, $src is a reference to a hash similar to your original > %src, but has > another hash reference as its values, so you can extract > > my $totalsent = $src{src1}{totalsent}; > > and so on. > > HTH,
Thanks all! After many testing, benchmarking, and configs on which part should perl or sql do, I can clocked that query to under 5 secs for almost 2 months of available data. Learned the hard-way that perl's must not do everything by itself. Ps. Still looking for reference on how awstats (or zabbix or cacti) do their reporting on many big databases in such a quick time. Cheers, Toddy Prawiraharjo -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>