FamiLink Admin wrote:
> Hello all.
Hello,
> I am trying to read a log file then look for a website and score. If
> that website has a score >100, take it and add it to a "Check me" list.
It would be easier to help if we could see examples of valid and invalid log
file entries.
> 1. I don't want to "recheck" so I have a list of checked sites that I
> want to verify with.
> 2. I don't want duplicates added to the "check me" list.
>
> Below is what I have. I have 1 above working but I cannot get the
> duplicates from printing to the list. The sub testfordup at the bottom
> will print the proper information but if it is in the log again it will
> print it again.
>
> Also, is the a better way of not printing a line if the line exists in
> the file?
>
> I am also getting:
> Use of uninitialized value in hash element at line 60.
> and
> Use of uninitialized value in regexp compilation at line 44,
> <AB_EX_FILE> line 14.
>
> I think this is because the text is www.site.com an the "." is not being
> read as text in the file.
>
> Any help would be greatly appreciated.
>
>
> #!/usr/bin/perl -w
> use strict;
> my $ab_file = 'autobannedsitelist1';
> my $ab_ex_file = 'autobannedsitelistexceptlst';
> my $log_file = 'access.log';
>
> open ( LOG_FILE, "-|", "tail -n 100000 $log_file" ) or die "No log file
> exists.\n $! \n";
> {
> # Begin parsing of log data
> my $pass = 0;
> my $dup = 0;
> foreach my $logfile_line (<LOG_FILE>) {
You are using a foreach loop which means that 100,000 lines of the log file
are stored in memory. It would be more efficient to use a while loop which
will only store a single line at a time in memory.
> $logfile_line =~ s/GET//;
> if ($logfile_line =~ m/Weighted/ ){
> my @logfile_fields = split /\s+/, $logfile_line, 6;
> my $site = (split /\//, ($logfile_fields[4]))[2]; # get the
> domain name
> my $sitemain = (split '\.', $site)[-2]; # get the site name
> without .com
> my $sitemain2 = (split '\.', $site)[-3]; # get the site name
> without .co.ku
> my $points = (split ' ',(split ':',
> (substr($logfile_fields[5],35,10)))[1])[0];
You are using split() a lot. It may be more efficient to use a single regular
expression but it is hard to tell without seeing actual data.
> if ($points > 100){
> $pass = &abextest($sitemain,$sitemain2);
You shouldn't use the '&' sigil with subroutines unless you really have to.
perldoc perlsub
> if ( $pass eq 0 ){
You are using a string comparison operator on a number which means that perl
has to convert the number to a string:
if ( $pass == 0 ){
> &testfordup($site);
> }
> }
> }
> }
> close (LOG_FILE);
> }
>
> sub abextest {
> my ($sitemain,$sitemain2)[EMAIL PROTECTED];
> my $pass = 0;
> open ( AB_EX_FILE, "<", $ab_ex_file ) or die "Can't write
> AUTOBANNED_EX_FILE: $!";
> foreach my $line (<AB_EX_FILE>){
You are using a foreach loop which means that all of the file
'autobannedsitelistexceptlst' is stored in memory. It would be more efficient
to use a while loop which will only store a single line at a time in memory.
> if (($line =~ m/$sitemain/i ) || ( $line =~
> m/$sitemain2/i )) {
It looks like this *may* be line 44? If so then either $sitemain or
$sitemain2 is undefined.
> $pass = 1;
> }
> }
> close (AB_EX_FILE);
> return $pass;
> }
>
> sub testfordup {
> my %seen;
> open AB_FILE, "< $ab_file" or die "Can't read AUTOBANNED_FILE: $!";
> while (<AB_FILE>) { $seen{$_} = 1 } # build the hash of seen lines
> close AB_FILE;
>
> my ($site) = @_;
> open AB_FILE, ">> $ab_file" or die "Can't append AUTOBANNED_FILE: $!";
> print AB_FILE "$site\n" if not ($seen{$_}++) ;
It looks like this *may* be line 60? If so then you have not explicitly
stored a value in $_ so it is probably undefined. Perhaps you meant to use
$seen{$site} instead? But that probably wouldn't work either as you didn't
chomp() the data before you added it to %seen.
> close (AB_FILE);
> return;
> }
John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/