Andrew Gaffney wrote:
I'm working on a custom Perl script to parse my Apache logs and report custom information. When I run the following program, it ends up eating all available RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing something wrong?
The only problems I can see is that your regular expression is inefficient and your AoA @requests may get very large.
#!/usr/bin/perl
use strict; use warnings;
use CGI();
my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 }; my @requests; my $start = time; open LOG, "< /var/log/apache/access_log";
You should *ALWAYS* verify that the file opened correctly.
open LOG, '< /var/log/apache/access_log' or die "Cannot open '/var/log/apache/access_log' $!";
while(<LOG>) { my $line = $_;
Why not just:
while ( my $line = <LOG> ) {
Why use $line at all?
$line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?: (.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/;
If you are not using $2, $3, $5, $7 and $10 why capture them? You should probably replace .+? with something more meaningful that won't backtrack.
my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4, $6, $8, $9, $11);
You shouldn't use the numererical scalars unless the regular expression succeeded or they will just contain the values from the last successful match.
$request = CGI::unescape($request); push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser]; }
I would write that while loop as:
while ( <LOG> ) { my @fields = /^(\d+\.\d+\.\d+\.\d+) .+? .+? \[(.+?)\] "(?:.+? )?(.+)(?:.+?)?" (\d+) (.+?) ".+?" "(.+?)"$/ or next; $fields[ 2 ] = CGI::unescape( $fields[ 2 ] ); push @requests, [EMAIL PROTECTED]; }
my $end = time; my $elapsed = $end - $start; close LOG;
print "$#requests total records. $elapsed seconds elapsed\n";
John
I had originally wrote this program about 6 months ago when I still had a lot more Perl bad habits. I just started expanding upon the program without really thinking about all that. I'll take your suggestions, though.
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>