FamiLink Admin wrote: > Hello all. Hello,
> I am trying to read a log file then look for a website and score. If > that website has a score >100, take it and add it to a "Check me" list. It would be easier to help if we could see examples of valid and invalid log file entries. > 1. I don't want to "recheck" so I have a list of checked sites that I > want to verify with. > 2. I don't want duplicates added to the "check me" list. > > Below is what I have. I have 1 above working but I cannot get the > duplicates from printing to the list. The sub testfordup at the bottom > will print the proper information but if it is in the log again it will > print it again. > > Also, is the a better way of not printing a line if the line exists in > the file? > > I am also getting: > Use of uninitialized value in hash element at line 60. > and > Use of uninitialized value in regexp compilation at line 44, > <AB_EX_FILE> line 14. > > I think this is because the text is www.site.com an the "." is not being > read as text in the file. > > Any help would be greatly appreciated. > > > #!/usr/bin/perl -w > use strict; > my $ab_file = 'autobannedsitelist1'; > my $ab_ex_file = 'autobannedsitelistexceptlst'; > my $log_file = 'access.log'; > > open ( LOG_FILE, "-|", "tail -n 100000 $log_file" ) or die "No log file > exists.\n $! \n"; > { > # Begin parsing of log data > my $pass = 0; > my $dup = 0; > foreach my $logfile_line (<LOG_FILE>) { You are using a foreach loop which means that 100,000 lines of the log file are stored in memory. It would be more efficient to use a while loop which will only store a single line at a time in memory. > $logfile_line =~ s/GET//; > if ($logfile_line =~ m/Weighted/ ){ > my @logfile_fields = split /\s+/, $logfile_line, 6; > my $site = (split /\//, ($logfile_fields[4]))[2]; # get the > domain name > my $sitemain = (split '\.', $site)[-2]; # get the site name > without .com > my $sitemain2 = (split '\.', $site)[-3]; # get the site name > without .co.ku > my $points = (split ' ',(split ':', > (substr($logfile_fields[5],35,10)))[1])[0]; You are using split() a lot. It may be more efficient to use a single regular expression but it is hard to tell without seeing actual data. > if ($points > 100){ > $pass = &abextest($sitemain,$sitemain2); You shouldn't use the '&' sigil with subroutines unless you really have to. perldoc perlsub > if ( $pass eq 0 ){ You are using a string comparison operator on a number which means that perl has to convert the number to a string: if ( $pass == 0 ){ > &testfordup($site); > } > } > } > } > close (LOG_FILE); > } > > sub abextest { > my ($sitemain,$sitemain2)[EMAIL PROTECTED]; > my $pass = 0; > open ( AB_EX_FILE, "<", $ab_ex_file ) or die "Can't write > AUTOBANNED_EX_FILE: $!"; > foreach my $line (<AB_EX_FILE>){ You are using a foreach loop which means that all of the file 'autobannedsitelistexceptlst' is stored in memory. It would be more efficient to use a while loop which will only store a single line at a time in memory. > if (($line =~ m/$sitemain/i ) || ( $line =~ > m/$sitemain2/i )) { It looks like this *may* be line 44? If so then either $sitemain or $sitemain2 is undefined. > $pass = 1; > } > } > close (AB_EX_FILE); > return $pass; > } > > sub testfordup { > my %seen; > open AB_FILE, "< $ab_file" or die "Can't read AUTOBANNED_FILE: $!"; > while (<AB_FILE>) { $seen{$_} = 1 } # build the hash of seen lines > close AB_FILE; > > my ($site) = @_; > open AB_FILE, ">> $ab_file" or die "Can't append AUTOBANNED_FILE: $!"; > print AB_FILE "$site\n" if not ($seen{$_}++) ; It looks like this *may* be line 60? If so then you have not explicitly stored a value in $_ so it is probably undefined. Perhaps you meant to use $seen{$site} instead? But that probably wouldn't work either as you didn't chomp() the data before you added it to %seen. > close (AB_FILE); > return; > } John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/