RE: Querying very big database

Toddy Prawiraharjo Mon, 02 Oct 2006 22:35:20 -0700

> Toddy Prawiraharjo wrote:
>  >
>  > From: Rob Dixon [mailto:[EMAIL PROTECTED]
>  > Sent: Friday, 29 September 2006 10:04 AM
>  > To: beginners@perl.org
>  > Subject: Re: Querying very big database
>  >>
>  >> Toddy Prawiraharjo wrote:
>  >>
>  >>> Hello all,
>  >>>
>  >>> I am not sure if i'm inside the correct mailing list, 
> but it should be
>  >>> simple for you guys. Recently i started a simple perl 
> program to catch all
>  >>> syslog from my netscreen firewall, and put them into a 
> database, so later
>  >>> can do some analysing on it (sort of AWStats). the 
> database itself is pretty
>  >>> big, with 600k entry (for 3 weeks of the running 
> firewall) worth 80 megs in
>  >>> mysql. My question is, during the analysing my perl 
> script giving very slow
>  >>> processing. I know the query to mysql itself is pretty 
> quick, less than 2
>  >>> seconds to return about 40k to 60k result to perl to be analysed
>  >>>
>  >>> Here it goes:
>  >>> <snipped>
>  >>>        $query = "SELECT src, rcvd FROM no_name WHERE 
> start_date >=
>  >>> '$fromdate' AND start_date <= '$todate';";
>  >>>        $sth = $dbh->prepare($query);
>  >>>        $sth -> execute() || &errorhere("SQL Error: 
> $DBI::errstr");
>  >>>        print "$query<br>";
>  >>>
>  >>>        my %src = ();
>  >>>        my ($totalsent, $totalrcvd, $srcchecked);
>  >>>
>  >>>        while(@row = $sth->fetchrow_array){
>  >>>                $srcchecked = 0;
>  >>>                while($source, $total = each (%src)){
>  >>>                        if ($source eq "$row[0]"){
>  >>>                                $srcchecked = 1;
>  >>>                                $src{$source} = 
> $src{$source} + $row[1];
>  >>>                        }
>  >>>                }
>  >>>                if ($srcchecked != 1){
>  >>>                        print "$row[0]<br>";
>  >>>                        $src{$row[0]} = $row[1];
>  >>>                }
>  >>>                $totalrcvd = $totalsent + $row[1];
>  >>>                #$totalrcvd = $totalrcvd + $row[2];
>  >>>        }
>  >>> </snipped>
>  >>>
>  >>> The while loop to do analysing on the data take more 
> than 15 minutes, and
>  >>> that only to a query for 1 day long 
> records($fromdate-$todate)  So, if I
>  >>> want perl to give me faster result, what's the solution? 
> Did I make
>  >>> fundamentally wrong approach?
>  >>
>  >>
>  >> You've missed the whole point of hashes, which is that 
> they will access
>  >> a data
>  >> value directly from a key. What you've written is similar to
>  >>
>  >>    my $i = 0;
>  >>    my $value;
>  >>    foreach my $v (@array) {
>  >>      if ($i++ == $n) {
>  >>        $value = $v;
>  >>      }
>  >>    }
>  >>
>  >> instead of
>  >>
>  >>    $value = $array[$n];
>  >>
>  >> Try this while loop instead of the one you have and see 
> if you get an
>  >> improvement:
>  >>
>  >>    while (@row = $sth->fetchrow_array) {
>  >>
>  >>      my $key = $row[0];
>  >>
>  >>      print "$key<br>" unless exists $src{$key};
>  >>      $src{$key} += $row[1];
>  >>
>  >>      $totalsent += $row[1];
>  >>      #$totalrcvd += $row[2];
>  >>    }
>  >>
>  >> HTH,
>  >>
>  >> Rob
>  >>
>  >
>  > THX! I always knew it's my n00b scripting that caused the 
> problem. It
>  > now down to 4 lines inside while loop, and I'm flying! The 
> processing
>  > down from ~15 minutes to 4 secs! But, for longer date 
> range (2 weeks
>  > time limit) it clocked at 25 wallclock secs (with 
> processing ~350k mysql
>  > return entries). Any more way to process this beast any 
> faster? Any good
>  > reading or reference about this? Just afraid, if this rolled to
>  > production level, the report for months activities will 
> take minutes to
>  > make!
>  >
>  > <snipped>
>  >         $query = "SELECT src, sent, rcvd FROM no_name 
> WHERE start_date
>  >   >= '$fromdate' AND start_date <= '$todate';";
>  >
>  >         $sth = $dbh->prepare($query);
>  >         $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
>  >         #print "$query<br>";
>  >
>  >         my %src = ();
>  >         my ($totalsent, $totalrcvd);
>  >
>  >         while(@row = $sth->fetchrow_array){
>  >                 my $key = $row[0];
>  >                 $src{$key} += $row[2];
>  >                 $totalsent = $totalsent + $row[1];
>  >                 $totalrcvd = $totalrcvd + $row[2];
>  >         }
>  > </snipped>
> 
> Hi Toddy
> 
> (Please bottom-post your replies. One day everybody in the 
> world will know to do
> this and perl.beginners will be a happy, shiny place!)
> 
> Let the database engine do it for you!
> 
>    my $query = qq(
>      SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd
>      FROM no_name
>      WHERE start_date >= '$fromdate' AND start_date <= '$todate'
>      GROUP BY src
>    );
> 
>    my $sth = $dbh->prepare($query);
>    $sth->execute() || errorhere("SQL Error: $DBI::errstr");
> 
>    my $src = $sth->fetchall_hashref('src');
> 
>    foreach my $srcval(keys %$src) {
>      $totalsent += $src->{$srcval}{totalsent};
>      $totalrcvd += $src->{$srcval}{totalrcvd};
>    }
> 
> 
> (I couldn't easily test this, although it is syntax checked, 
> so beware)
> 
> Now, $src is a reference to a hash similar to your original 
> %src, but has
> another hash reference as its values, so you can extract
> 
>    my $totalsent = $src{src1}{totalsent};
> 
> and so on.
> 
> HTH,


Thanks all!

After many testing, benchmarking, and configs on which part should perl
or sql do, I can clocked that query to under 5 secs for almost 2 months
of available data. Learned the hard-way that perl's must not do
everything by itself.

Ps. Still looking for reference on how awstats (or zabbix or cacti) do
their reporting on many big databases in such a quick time.

Cheers,

Toddy Prawiraharjo


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: Querying very big database

Reply via email to