On Wed, May 5, 2010 at 7:21 PM, Akhthar Parvez K <akht...@sysadminguide.com>wrote:
> On Wednesday 05 May 2010, Rob Coops wrote: > > Would it not be more efficient to reset the file handle to the star of > the > > file? > > > > use strict; > > use warnings; > > use Fcntl qw(:seek); > > ... > > foreach my $condition (@conditions) { > > seek ( $fh, 0, 0 ) or die "ERROR: Could not reset file handle\n"; > > while (<LOGFILE>) { > > my $line = $_; > > ... > > } > > } > > > > That way there is not need to re-open the file handle over and over again > > saving on that IO overhead. Of course you need to then deal with the > > posibility of the file being rotated etc. if this is a long running > script, > > but if this is a once every now and then started script say once every 5 > or > > 10 minutes then you should be fine doing this. > > > > If you want this script to run all the time in a never ending loop and > deal > > with rotating files due to size/dates etc, then the reopening of the file > is > > the simplest way of doing this. > > He may also avoid opening filehandle again by doing this: > > eg:- > > while ($line = <$fh>) > { > if ($line =~ /$condition1/) { print $fh_out1 "$line"; } > elsif ($line =~ /$condition2/) { print $fh_out1 "$line"; } > } > > However, this wouldn't be appropriate if there're more conditions. So the > big question is, which way is better if we need to do a check statement > (grep or regex) with a file multiple times in a program? > > 1. Storing the file into a list or a variable at the beginning - But what > if the file is huge? > 2. Opening filehandles again and again - would cause I/O overhead > 3. any other method? > > -- > Regards, > Akhthar Parvez K > http://Tips.SysAdminGUIDE.COM > UNIX is basically a simple operating system, but you have to be a genius to > understand the simplicity - Dennie Richie > A file never starts life being huge but certainly logs tend to grow, and they are not always kept in check properly so assume they will be massive (I've seen flat text logs that grew by as much as +1GB per day) assuming that the file will always be relatively small because it is at this point in time is very dangerous especially if you have little or no control over that file your self. An if/elsif construction is nice for a few conditions but that will not work for say 50 of them even at 10 the script will become hard to read, of course one could construct a hash where the key is the condition and the value is an array of lines found that match that condition, but this runs into the same memory problem again. The only way to do this and not have to worry about memory issues is by resetting the file handle to the start of the file at the start of the loop, that way you avoid any memory problems and you avoid having to open and close the file say 50 times. One other thing I would like to remark is that when you are monitoring a log or anything else just reporting errors is not a good idea you want to report both the good and the bad. Think of the following situation: Your machines logs are flooded with errors but due to a configuration error your scheduler is not starting the monitoring script or the script is started but due to a permissions problem it cannot access the file or it cannot find a path to the mail server due to a routing issue etc... you will never know that your logs are being spammed with errors since if you don't get a mail this means everything is fine, right? I have been in a situation where a critical production system had been screaming that the restoration of its raid array had caused the database on it to become inconsistent but due to an error in the routing tables the error mails where being sent out from an incorrect interface and the mail server could not be reached. It took over 3 months before the system went down in a catastrophic failure when there was no other solution then to rebuild the whole database, all of that just because positive news was not being send out. So if you want to rely on mails send from the machine to monitor logs make sure you send a mail even if all it says that there is nothing to report. ;-) Rob