No, I was not expecting anyone to give me a ready made program! I believe in the adage that "Self Help is the Best Help":-)
Thanks for the pointer to http://www.oreilly.com/catalog/perlwsmng/chapter/ch08.html. I already looked at that before posting my message. At the end of this email, is the script that I wrote. Thanks for the pointer. I looked at the Parse Log and looked to me that the report that this module generates is not what I am looking at. Here is the perl script that I wrote. I am able to count the number of multiple timestamps alright. I am having problem with the time interval and that's where I need help. May be the algorithm I followed needs to be modified. Pl. have a look at the script and make any constructive suggestions! Thanks Anand --------- The script: ------------ #!/usr/bin/perl use Getopt::Long; use Time::Local; my $file="access_log_modified"; my $line; my $count; my $begin_time = ""; my $end_time; my %seen = (); my @visual_pages = (); my ($datetime, $get_post, $Day, $Month, $Year, $Hour, $Minute, $Second); my $interval = 60; #An interval of 1 minute my @pages_processed; count_recs(); sub count_recs { open (INFILE, "<$file") || die "Cannot read from $file"; WHILELOOP: while (<INFILE>) { $line = $_; chomp; ($datetime,$get_post) = (split / /) [3,6]; $datetime =~ s/\[//; ($Day,$Month,$Year,$Hour,$Minute,$Second)= $datetime =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#; next WHILELOOP if ($get_post =~ /\.js$/ || $get_post =~ /\.gif$/ || $get_post =~ /\.css$/); unless ($begin_time) { $begin_time = $datetime; } $end_time = $datetime; &calculate_time($begin_time, $end_time); } #while foreach $visual_page (sort by_seen keys %seen) { push (@{$pages_processed{$visual_page}}, $seen{$visual_page}); } foreach $page_processed (sort keys %pages_processed) { print "$page_processed: @{$pages_processed{$page_processed}}\n"; } close(INFILE); } sub calculate_time { my @visual_pages = (); my @processed_visual_pages = (); ###Break up the date time into Day, Month, Year, Hour, Minute and Second. ($begin_Day,$begin_Month,$begin_Year,$begin_Hour,$begin_Minute,$begin_Second)= $begin_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#; ($end_Day,$end_Month,$end_Year,$end_Hour,$end_Minute,$end_Second)= $end_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#; ###Since the Day above is in the Alpha format, Jan, Feb,... and not numeric ###format, 01, 02, 03,..., we need to convert it to a numeric format.Otherwise, ###we cannot pass Day to timelocal or localtime modules. That's why the ###subroutine is called. It converts Jan into 01 and so on. &Initialize; my $begin_seconds = timelocal($begin_Second, $begin_Minute, $begin_Hour, $begin_Day, $MonthToNumber{$begin_Month}, $begin_Year-1900); my $end_seconds = timelocal($end_Second, $end_Minute, $end_Hour, $end_Day, $MonthToNumber{$end_Month}, $end_Year-1900); ###elapsed time is the difference between two timestamps of two consecutive ###records in the log file. my $elapsed = $end_seconds - $begin_seconds; ###We check whether the elapsed time is greater than the interval that we ###choose, 1 minute or 15 minutes. If yes, then we need to start counting the ###records into a new 15 minute interval. If no, count the number of records ###in the same interval. Also, reset the begin_time and end_time, for the new ###count. Store all the interval periods into an array, processed_visual_pages. if ( $elapsed > $interval ){ $count = 0; $begin_time = $end_time; $end_time = $datetime; push (@processed_visual_pages, $end_time); } else { push (@visual_pages, $end_time); foreach $visual_page (@visual_pages) { $seen{$visual_page}++; } } } sub Initialize { my %MonthToNumber=( 'Jan', '01', 'Feb', '02', 'Mar', '03', 'Apr', '04', 'May', '05', 'Jun', '06', 'Jul', '07', 'Aug', '08', 'Sep', '09', 'Oct', '10', 'Nov', '11', 'Dec', '12', ); my %NumberToMonth=( '01', 'Jan', '02', 'Feb', '03', 'Mar', '04', 'Apr', '05', 'May', '06', 'Jun', '07', 'Jul', '08', 'Aug', '09', 'Sep', '10', 'Oct', '11', 'Nov', '12', 'Dec', ); } sub by_seen () { ( $seen{$b} cmp $seen{$a} ); } ---------------- The output I get is: 25/Apr/2003:13:54:02: 3 25/Apr/2003:13:54:19: 2 25/Apr/2003:13:54:22: 4 25/Apr/2003:13:54:34: 3 25/Apr/2003:13:54:38: 5 25/Apr/2003:13:54:41: 3 25/Apr/2003:13:54:43: 6 25/Apr/2003:13:54:44: 3 25/Apr/2003:13:54:46: 5 25/Apr/2003:13:54:47: 2 25/Apr/2003:13:54:48: 3 25/Apr/2003:13:54:50: 7 25/Apr/2003:13:54:51: 4 25/Apr/2003:13:54:53: 2 25/Apr/2003:13:54:58: 3 25/Apr/2003:13:55:01: 2 25/Apr/2003:13:55:02: 4 25/Apr/2003:13:55:05: 4 25/Apr/2003:13:55:08: 1 25/Apr/2003:13:55:14: 3 25/Apr/2003:13:55:15: 1 25/Apr/2003:13:56:13: 5 25/Apr/2003:13:56:27: 5 25/Apr/2003:13:56:35: 4 25/Apr/2003:13:56:40: 4 25/Apr/2003:13:56:45: 1 25/Apr/2003:13:56:51: 5 ------------------------ Ramprasad <[EMAIL PROTECTED]> wrote:Anand Babu wrote: > Hi all, > > I am new to this group. I need help regarding a perl script which > parses the web log file, access_log. > Welcome , This is the most friendly list I have seen > The format of the access_log is: > > 127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 > 34906 > > The goal is to What do You expect ? Someone would write a full program for you to use ? Someone might, but that way you will take a long time to do real perl yourself Best way to use a newsgroup is to write out a code yourself, If you get stuck post what You have done and what is not working You will get enough help here You seem to have a fairly simple thing A short algo will be write a function that will convert a timestamp to a date_range_string like foo('15/Jun/2003:13:54:02')='15/Jun/2003:13:45:00-15/Jun/2003:14:00:00' ... .. # Use a smaller string if you find this range string very long .. Now read the file line by line while(){ ($x,$y,...$timestamp,....) = split; # Fill in the blanks later $hash{foo($timestamp)}++; } Now %hash has got all info you need Best of Luck Ram PS BTW Have a look at analog http://www.analog.cx/ before you write any code yourself. You might find what you want ---------------------------------------------------------------- NETCORE SOLUTIONS *** Ph: +91 22 5662 8000 Fax: +91 22 5662 8134 MailServ: Email, IM, Proxy, Firewall, Anti-Virus, LDAP Fleximail: Mail Storage, Management and Relaying http://netcore.co.in Emergic Freedom: Linux-based Thin Client-Thick Server Computing http://www.emergic.com BlogStreet: Top Blogs, Neighborhoods, Search and Utilities http://www.blogstreet.com Rajesh Jain's Weblog on Technology: http://www.emergic.org ---------------------------------------------------------------- --------------------------------- Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month!