FamiLink Admin wrote:
> Hello all.

Hello,

> I am trying to read a log file then look for a website and score.  If
> that website has a score >100, take it and add it to a "Check me" list.

It would be easier to help if we could see examples of valid and invalid log
file entries.

> 1. I don't want to "recheck" so I have a list of checked sites that I
>    want to verify with.
> 2. I don't want duplicates added to the "check me" list.
> 
> Below is what I have.  I have 1 above working but I cannot get the
> duplicates from printing to the list.  The sub testfordup at the bottom
> will print the proper information but if it is in the log again it will
> print it again.
> 
> Also, is the a better way of not printing a line if the line exists in
> the file?
> 
> I am also getting:
> Use of uninitialized value in hash element at line 60.
> and
> Use of uninitialized value in regexp compilation at line 44,
> <AB_EX_FILE> line 14.
> 
> I think this is because the text is www.site.com an the "." is not being
> read as text in the file.
> 
> Any help would be greatly appreciated.
> 
> 
> #!/usr/bin/perl -w
> use strict;
> my $ab_file = 'autobannedsitelist1';
> my $ab_ex_file = 'autobannedsitelistexceptlst';
> my $log_file = 'access.log';
> 
> open ( LOG_FILE, "-|", "tail -n 100000 $log_file" ) or die "No log file
> exists.\n $! \n";
> {
> # Begin parsing of log data
> my $pass = 0;
> my $dup = 0;
> foreach my $logfile_line (<LOG_FILE>) {

You are using a foreach loop which means that 100,000 lines of the log file
are stored in memory.  It would be more efficient to use a while loop which
will only store a single line at a time in memory.


>    $logfile_line =~ s/GET//;
>    if ($logfile_line =~ m/Weighted/ ){
>        my @logfile_fields = split /\s+/, $logfile_line, 6;
>        my $site = (split /\//, ($logfile_fields[4]))[2]; # get the
> domain name
>        my $sitemain = (split '\.', $site)[-2]; # get the site name
> without .com
>        my $sitemain2 = (split '\.', $site)[-3]; # get the site name
> without .co.ku
>        my $points = (split ' ',(split ':',
> (substr($logfile_fields[5],35,10)))[1])[0];

You are using split() a lot.  It may be more efficient to use a single regular
expression but it is hard to tell without seeing actual data.


>        if ($points > 100){
>            $pass = &abextest($sitemain,$sitemain2);

You shouldn't use the '&' sigil with subroutines unless you really have to.

perldoc perlsub


>                if ( $pass eq 0 ){

You are using a string comparison operator on a number which means that perl
has to convert the number to a string:

                if ( $pass == 0 ){


>                    &testfordup($site);
>                }
>            }
>        }
>    }
> close (LOG_FILE);
> }
> 
> sub abextest {
>     my ($sitemain,$sitemain2)[EMAIL PROTECTED];
>     my $pass = 0;
>     open ( AB_EX_FILE, "<", $ab_ex_file ) or die "Can't write
> AUTOBANNED_EX_FILE: $!";
>     foreach my $line (<AB_EX_FILE>){

You are using a foreach loop which means that all of the file
'autobannedsitelistexceptlst' is stored in memory.  It would be more efficient
to use a while loop which will only store a single line at a time in memory.


>         if (($line =~ m/$sitemain/i ) || ( $line =~
> m/$sitemain2/i ))  {

It looks like this *may* be line 44?  If so then either $sitemain or
$sitemain2 is undefined.


>             $pass = 1;
>         }
>     }
>     close (AB_EX_FILE);
> return $pass;
> }
> 
> sub testfordup {
>     my %seen;
>     open AB_FILE, "<  $ab_file" or die "Can't read AUTOBANNED_FILE: $!";
>     while (<AB_FILE>) { $seen{$_} = 1 }  # build the hash of seen lines
>     close AB_FILE;
> 
>     my ($site) = @_;
>     open AB_FILE, ">> $ab_file" or die "Can't append AUTOBANNED_FILE: $!";
>     print AB_FILE "$site\n" if not ($seen{$_}++) ;

It looks like this *may* be line 60?  If so then you have not explicitly
stored a value in $_ so it is probably undefined.  Perhaps you meant to use
$seen{$site} instead?  But that probably wouldn't work either as you didn't
chomp() the data before you added it to %seen.


>     close (AB_FILE);
> return;
> }


John
-- 
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to