Andrew Gaffney wrote:
> 
> I'm working on a custom Perl script to parse my Apache logs and report custom
> information. When I run the following program, it ends up eating all available
> RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing
> something wrong?

The only problems I can see is that your regular expression is
inefficient and your AoA @requests may get very large.


> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use CGI();
> 
> my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul
> => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 };
> my @requests;
> my $start = time;
> open LOG, "< /var/log/apache/access_log";

You should *ALWAYS* verify that the file opened correctly.

open LOG, '< /var/log/apache/access_log'
    or die "Cannot open '/var/log/apache/access_log' $!";


> while(<LOG>) {
>    my $line = $_;

Why not just:

while ( my $line = <LOG> ) {

Why use $line at all?


>    $line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?:
> (.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/;

If you are not using $2, $3, $5, $7 and $10 why capture them?  You
should probably replace .+? with something more meaningful that won't
backtrack.


>    my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4,
> $6, $8, $9, $11);

You shouldn't use the numererical scalars unless the regular expression
succeeded or they will just contain the values from the last successful
match.


>    $request = CGI::unescape($request);
>    push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser];
> }

I would write that while loop as:

while ( <LOG> ) {
    my @fields = /^(\d+\.\d+\.\d+\.\d+) .+? .+? \[(.+?)\] "(?:.+?
)?(.+)(?:.+?)?" (\d+) (.+?) ".+?" "(.+?)"$/ or next;
    $fields[ 2 ] = CGI::unescape( $fields[ 2 ] );
    push @requests, [EMAIL PROTECTED];
}


> my $end = time;
> my $elapsed = $end - $start;
> close LOG;
> 
> print "$#requests total records. $elapsed seconds elapsed\n";


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to