Re: Need to pull matched string plus a few additional bytes

Arnaldo Guzman Fri, 27 Oct 2006 07:45:47 -0700

On Fri, 2006-10-27 at 09:36 -0400, Phil Miller wrote:
> I am working on my very first program and have run into a bit of a
> roadblock.  I am trying to print a report of users who show up in an IIS
> Log file.  The good news is that the format of the userid is
> WINDOWSDOMAIN\USERID.  The bad news is that it is not always at the same
> place in the IIS Log file due to some variable length fields that come
> before it.  Its location can vary left or right by about 10 bytes.
> 
>  
> 
> I read the IIS Log file in one line at a time.  I have gotten far enough
> that I can identify the lines with WINDOWSDOMAIN on it, but am stuck
> there.  The code $userid = substr($logfile_in, 33, 12); gets me close
> but depending on the length of the date, the time or the IP address, it
> is usually off by a few bytes.  A sample of the input is below to
> explain what I am talking about. 
> 
>  
> 
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/main.css
> 
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/contents.aspx
> 
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/footer.aspx
> 
>  
> 
> Essentially what I need to do is find the WINDOWSDOMAIN on a line, and
> write to a file the matched string plus \USERID data (up to the next
> space).  Does anyone have any suggestions?  I'm thinking there must be
> some very easy way to do it since Perl is made for this sort of thing.
> I remember reading about some Perl built-in capability that would take a
> scalar variable and parse it into an array based on a delimiter, but I
> can't remember what it is.  That would probably do it for me.  But if
> you know of a better way, I'm all ears.
> 
>  
> 
> Below is the code I am using.
> 
>  
> 
> 
> 
> # My first PERL program.
> 
>  
> 
> open USERIDOUT, ">userid.out.txt";
> 
> open IISLOG, "<ex061023.log";
> 
>  
> 
> $ctr = 0;
> 
> $hit_counter = 0;
> 
> $miss_counter = 0;
> 
> $logfile_in;
> 
> $userid;
> 
>  
> 
> while (<IISLOG>)
> 
> {
> 
>                 $logfile_in = $_;
> 
>                 if ( ($logfile_in =~ m/WINDOWSDOMAIN/i && $logfile_in =~
> m/itd/i)
> 
>                                 )
> 
>                 {
> 
>                                 print "\n** Found success\n";
> 
>                                 $hit_counter += 1;
> 
> 
>                                 $userid = substr($logfile_in, 33, 12);
> # This is not correct but is somewhat close
> 
>                                 print "\n", $userid;
> 
> 
>                 } 
> 
>                 else 
> 
>                 {
> 
>                                 print "Did not find success\n";
> 
>                                 $miss_counter += 1;
> 
>                 }
> 
> }
> 
>  
> 
> print "\n Hit Counter = ", $hit_counter;
> 
> print "\n Miss Counter = ", $miss_counter;
> 
> print "\n Total Records Counter = ", $hit_counter + $miss_counter;
> 
>  
> 
> close USERIDOUT;
> 
> close IISLOG;
> 
>  
> 
> 
> 
> *****************************************************
> 
> Phil
> Confidentiality Notice: 
> This e-mail and any attachments may contain confidential information intended 
> solely for the use of the addressee.  If the reader of this message is not 
> the intended recipient, any distribution, copying, or use of this e-mail or 
> its attachments is prohibited.  If you received this message in error, please 
> notify the sender immediately by e-mail and delete this message and any 
> copies. Thank you.


Please be careful with your open(), you will have problems if you do not
use a multiple argument open(), and never forget to check for errors.
And you shouldn't use bare-word filehandles. Now, I don't know if you
want to include the "\" within the second match, but here you go:

--- Using your sample data ---
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/main.css
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/contents.aspx
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/footer.aspx
--- End of sample data ---

#!/usr/bin/perl -w
use strict;

my $file = 'path/to/file';
my $out_file = 'file/you/want';

open my $iis_log, '<', $file
        or die "Could not open $file: $!\n";
open my $user_id_out, '>>', $out_file
        or die "Could not open $out_file for writing: $!\n";

while (<$iis_log>) {
        if (/(\w+)(\\\w+))/) {
                print "Match, printing to $out_file...\n";
                # from the sample data above, you will get something
                # like, "WINDOWSDOMAIN \USERID" on each line of the out
                # file:
                print $user_id_out "$1 $2\n";
        }
}

Each line will contain two "words", "WINDOWSDOMAIN" and "\USERID". If
you'd like to include the entire string that matched, use $_. Notice the
">>" when opening the OUT file, we want to append each match at the end
of the file.

You should "perldoc perlre" or go here:
http://perldoc.perl.org/perlretut.html - To learn more about Regular
Expressions. Particularly, read the "Extracting Matches" section.

-- Hope that helps.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Need to pull matched string plus a few additional bytes

Reply via email to