John W. Krahn wrote:
Andrew Gaffney wrote:

I'm working on a custom Perl script to parse my Apache logs and report custom
information. When I run the following program, it ends up eating all available
RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing
something wrong?

The only problems I can see is that your regular expression is inefficient and your AoA @requests may get very large.

#!/usr/bin/perl

use strict;
use warnings;

use CGI();

my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul
=> 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 };
my @requests;
my $start = time;
open LOG, "< /var/log/apache/access_log";

You should *ALWAYS* verify that the file opened correctly.

open LOG, '< /var/log/apache/access_log'
    or die "Cannot open '/var/log/apache/access_log' $!";

while(<LOG>) {
  my $line = $_;

Why not just:

while ( my $line = <LOG> ) {

Why use $line at all?

  $line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?:
(.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/;

If you are not using $2, $3, $5, $7 and $10 why capture them? You should probably replace .+? with something more meaningful that won't backtrack.

  my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4,
$6, $8, $9, $11);

You shouldn't use the numererical scalars unless the regular expression succeeded or they will just contain the values from the last successful match.

  $request = CGI::unescape($request);
  push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser];
}

I would write that while loop as:

while ( <LOG> ) {
    my @fields = /^(\d+\.\d+\.\d+\.\d+) .+? .+? \[(.+?)\] "(?:.+?
)?(.+)(?:.+?)?" (\d+) (.+?) ".+?" "(.+?)"$/ or next;
    $fields[ 2 ] = CGI::unescape( $fields[ 2 ] );
    push @requests, [EMAIL PROTECTED];
}

my $end = time;
my $elapsed = $end - $start;
close LOG;

print "$#requests total records. $elapsed seconds elapsed\n";

John

I had originally wrote this program about 6 months ago when I still had a lot more Perl bad habits. I just started expanding upon the program without really thinking about all that. I'll take your suggestions, though.


--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to