No, I was not expecting anyone to give me a ready made program! At the end
of this email, is the script that I wrote. 

Thanks for the pointer. I looked at the Parse Log and looked to me that the
report that this module generates is not what I am looking at.

Here is the perl script that I wrote. I am able to count the number of
multiple timestamps alright. I am having problem with the time interval and
that's where I need help.

May be the algorithm I followed needs to be modified. Pl. have a look at the
script and make any constructive suggestions.

Thanks
Anand
---------
The script:
------------
#!/usr/bin/perl 

use Getopt::Long;
use Time::Local;

my $file="access_log_modified";
my $line;
my $count;
my $begin_time = "";
my $end_time;
my %seen = ();
my @visual_pages = ();
my ($datetime, $get_post, $Day, $Month, $Year, $Hour, $Minute, $Second);
my $interval = 60;  #An interval of 1 minute
my @pages_processed;

count_recs();

sub count_recs {

   open (INFILE, "<$file") || die "Cannot read from $file";
   WHILELOOP: while (<INFILE>) {
                 $line = $_;
                 chomp;
                 ($datetime,$get_post) = (split / /) [3,6]; 
                 $datetime =~ s/\[//;
                 ($Day,$Month,$Year,$Hour,$Minute,$Second)= $datetime
=~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;

                    next WHILELOOP if ($get_post =~ /\.js$/ || $get_post =~
/\.gif$/ || $get_post =~ /\.css$/);
                    
                    unless ($begin_time) {
                       $begin_time = $datetime;
                    }
                    $end_time = $datetime;
                     
                     
                    &calculate_time($begin_time, $end_time); 
   } #while

   foreach $visual_page (sort by_seen keys %seen) {
      push (@{$pages_processed{$visual_page}}, $seen{$visual_page});
          
   }

   foreach $page_processed (sort keys %pages_processed) {
      print "$page_processed: @{$pages_processed{$page_processed}}\n";
   } 

   close(INFILE);
}

sub calculate_time {

   my @visual_pages = ();
   my @processed_visual_pages = ();

###Break up the date time into Day, Month, Year, Hour, Minute and Second.
 
 
($begin_Day,$begin_Month,$begin_Year,$begin_Hour,$begin_Minute,$begin_Second
)= $begin_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;

   ($end_Day,$end_Month,$end_Year,$end_Hour,$end_Minute,$end_Second)=
$end_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;

###Since the Day above is in the Alpha format, Jan, Feb,... and not numeric
###format, 01, 02, 03,..., we need to convert it to a numeric
format.Otherwise,
###we cannot pass Day to timelocal or localtime modules. That's why the 
###subroutine is called. It converts Jan into 01 and so on. 

   &Initialize;
   
   my $begin_seconds = timelocal($begin_Second, $begin_Minute, $begin_Hour,
$begin_Day, $MonthToNumber{$begin_Month}, $begin_Year-1900);

   my $end_seconds = timelocal($end_Second, $end_Minute, $end_Hour,
$end_Day, $MonthToNumber{$end_Month}, $end_Year-1900);

###elapsed time is the difference between two timestamps of two consecutive
###records in the log file.
   
   my $elapsed = $end_seconds - $begin_seconds; 

###We check whether the elapsed time is greater than the interval that we 
###choose, 1 minute or 15 minutes. If yes, then we need to start counting
the
###records into a new 15 minute interval. If no, count the number of records
###in the same interval. Also, reset the begin_time and end_time, for the
new
###count. Store all the interval periods into an array,
processed_visual_pages.
     
   if ( $elapsed > $interval ){
      $count = 0;
      $begin_time = $end_time;
      $end_time   = $datetime;
      push (@processed_visual_pages, $end_time);
   } else {
        push (@visual_pages, $end_time);
        foreach $visual_page (@visual_pages) {
           $seen{$visual_page}++;
        }
     }
}

sub Initialize {
  my %MonthToNumber=(
        'Jan', '01',
        'Feb', '02',
        'Mar', '03',
        'Apr', '04',
        'May', '05',
        'Jun', '06',
        'Jul', '07',
        'Aug', '08',
        'Sep', '09',
        'Oct', '10',
        'Nov', '11',
        'Dec', '12',
    );
    
  my %NumberToMonth=(
        '01', 'Jan',
        '02', 'Feb', 
        '03', 'Mar', 
        '04', 'Apr', 
        '05', 'May', 
        '06', 'Jun', 
        '07', 'Jul', 
        '08', 'Aug', 
        '09', 'Sep', 
        '10', 'Oct', 
        '11', 'Nov', 
        '12', 'Dec', 
    );

}   

sub by_seen () {


( $seen{$b} cmp $seen{$a} );

}
----------------
The output I get is:

25/Apr/2003:13:54:02: 3
25/Apr/2003:13:54:19: 2
25/Apr/2003:13:54:22: 4
25/Apr/2003:13:54:34: 3
25/Apr/2003:13:54:38: 5
25/Apr/2003:13:54:41: 3
25/Apr/2003:13:54:43: 6
25/Apr/2003:13:54:44: 3
25/Apr/2003:13:54:46: 5
25/Apr/2003:13:54:47: 2
25/Apr/2003:13:54:48: 3
25/Apr/2003:13:54:50: 7
25/Apr/2003:13:54:51: 4
25/Apr/2003:13:54:53: 2
25/Apr/2003:13:54:58: 3
25/Apr/2003:13:55:01: 2
25/Apr/2003:13:55:02: 4
25/Apr/2003:13:55:05: 4
25/Apr/2003:13:55:08: 1
25/Apr/2003:13:55:14: 3
25/Apr/2003:13:55:15: 1
25/Apr/2003:13:56:13: 5
25/Apr/2003:13:56:27: 5
25/Apr/2003:13:56:35: 4
25/Apr/2003:13:56:40: 4
25/Apr/2003:13:56:45: 1
25/Apr/2003:13:56:51: 5
------------------------

-----Original Message-----
From: Rai,Dharmender [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 18, 2003 1:30 AM
To: '[EMAIL PROTECTED]'; 'Anand Ayyagary'
Subject: RE: Parsing the Apache web log file, access_log


module Apache::ParseLog would help you !!

> ----------
> From:         Anand Ayyagary[SMTP:[EMAIL PROTECTED]
> Sent:         Wednesday, June 18, 2003 1:02 AM
> To:   '[EMAIL PROTECTED]'
> Subject:      Parsing the Apache web log file, access_log
> 
> Help needed for Perl script 
> Hi all, 
> 
> I am new to this group. I need help regarding a perl script which parses
> the
> web log file, access_log. 
> 
> The format of the access_log is: 
> 
> 127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 34906 
> 
> The goal is to 
> 
> 1. Perfom a count of the pages for the given timestamp. It is possible
> that
> multiple pages exist with the same timestamp (As the timestamp I mentioned
> above). 
> 2. Within a range of time interval, say, 15 minutes starting with the
> timestamp of the first line in the log file, I would like to compute the
> average of the number of pages, minimum and maximum number of pages in
> that
> interval. 
> 
> 3. I would like the output as below. Following is just an example. 
> 
> Time Average Pages Min Pages Max Pages 
> --------------------------- ----------------- ----------------- 
> 15/Jun/2003:14:09:02 6.5 3 10 
> 15/Jun/2003:14:24:02 5.5 4 7 
> 
> 
> I shall appreciate an early response. 
> 
> Thanks in advance 
> 
> Regards 
> Anand
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
____________________________________________
Confidential:  This electronic message and all contents contain information
from Syntel, Inc. which may be privileged, confidential or otherwise
protected from disclosure. The information is intended to be for the
addressee only. If you are not the addressee, any disclosure, copy,
distribution or use of the contents of this message is prohibited.  If you
have received this electronic message in error, please notify the sender
immediately and destroy the original message and all copies.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to