Andrew Gaffney wrote: > > I'm working on a custom Perl script to parse my Apache logs and report custom > information. When I run the following program, it ends up eating all available > RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing > something wrong?
The only problems I can see is that your regular expression is inefficient and your AoA @requests may get very large. > #!/usr/bin/perl > > use strict; > use warnings; > > use CGI(); > > my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul > => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 }; > my @requests; > my $start = time; > open LOG, "< /var/log/apache/access_log"; You should *ALWAYS* verify that the file opened correctly. open LOG, '< /var/log/apache/access_log' or die "Cannot open '/var/log/apache/access_log' $!"; > while(<LOG>) { > my $line = $_; Why not just: while ( my $line = <LOG> ) { Why use $line at all? > $line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?: > (.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/; If you are not using $2, $3, $5, $7 and $10 why capture them? You should probably replace .+? with something more meaningful that won't backtrack. > my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4, > $6, $8, $9, $11); You shouldn't use the numererical scalars unless the regular expression succeeded or they will just contain the values from the last successful match. > $request = CGI::unescape($request); > push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser]; > } I would write that while loop as: while ( <LOG> ) { my @fields = /^(\d+\.\d+\.\d+\.\d+) .+? .+? \[(.+?)\] "(?:.+? )?(.+)(?:.+?)?" (\d+) (.+?) ".+?" "(.+?)"$/ or next; $fields[ 2 ] = CGI::unescape( $fields[ 2 ] ); push @requests, [EMAIL PROTECTED]; } > my $end = time; > my $elapsed = $end - $start; > close LOG; > > print "$#requests total records. $elapsed seconds elapsed\n"; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>