Andrew Gaffney wrote:
>
> I'm working on a custom Perl script to parse my Apache logs and report custom
> information. When I run the following program, it ends up eating all available
> RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing
> something wrong?
The only problems I can see is that your regular expression is
inefficient and your AoA @requests may get very large.
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use CGI();
>
> my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul
> => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 };
> my @requests;
> my $start = time;
> open LOG, "< /var/log/apache/access_log";
You should *ALWAYS* verify that the file opened correctly.
open LOG, '< /var/log/apache/access_log'
or die "Cannot open '/var/log/apache/access_log' $!";
> while(<LOG>) {
> my $line = $_;
Why not just:
while ( my $line = <LOG> ) {
Why use $line at all?
> $line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?:
> (.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/;
If you are not using $2, $3, $5, $7 and $10 why capture them? You
should probably replace .+? with something more meaningful that won't
backtrack.
> my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4,
> $6, $8, $9, $11);
You shouldn't use the numererical scalars unless the regular expression
succeeded or they will just contain the values from the last successful
match.
> $request = CGI::unescape($request);
> push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser];
> }
I would write that while loop as:
while ( <LOG> ) {
my @fields = /^(\d+\.\d+\.\d+\.\d+) .+? .+? \[(.+?)\] "(?:.+?
)?(.+)(?:.+?)?" (\d+) (.+?) ".+?" "(.+?)"$/ or next;
$fields[ 2 ] = CGI::unescape( $fields[ 2 ] );
push @requests, [EMAIL PROTECTED];
}
> my $end = time;
> my $elapsed = $end - $start;
> close LOG;
>
> print "$#requests total records. $elapsed seconds elapsed\n";
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>