Re: perl regexp need help!

Alex Wed, 10 Aug 2005 00:32:32 -0700

On Wednesday 10 August 2005 04:36, you wrote:
> On 8/9/05, Wagner, David --- Senior Programmer Analyst --- WGO
>
> <[EMAIL PROTECTED]> wrote:
> > Alex wrote:
> > > Hello everyone,
> > >
> > > I need some help to fix a problem in mailgraph.pl script. I'm not a
> > > perl programmer, so i hope to find a little help here...
> > >
> > > I need to translate an old code which is parsing my maillog file, into
> > > new one, related to my present needs.
> > >
> > > The old code has worked with old vexira logging style (now
> > > deprecated).... Lines in my maillog was something like:
> > >
> > > Aug  7 13:40:28 pharma vgatefwd[1532]: VIRUS bla bla bla
> > >
> > > Here comes old code:
> > >         elsif($prog eq 'vagatefwd') {
> > >                 # Vexira antivirus
> > >                 if($text =~ /^VIRUS/) {
> > >                         event($time, 'virus');
> > >                 }
> > >         }
> > >
> > > The new code (rewrited by me), should work with new vexira logging
> > > style... lines in my maillog as following:
> > >
> > > Aug  7 13:40:28 pharma hook[2446]: ***** Virus (I-Worm.Netsky.Q1)
> > > killed with file delete!
> > >
> > > Here come the new code....
> > >         elsif($prog eq 'hook') {
> > >                 # Vexira antivirus
> > >                 if($text =~ /^\([\*]+\) Virus\b/) {
> >
> >         No it won't work for you. You are asking for a start of line then
> > a paren followed by zero or more * then a paren, a space the the word
> > Virus. You can try something like:
> >
> >                 if($text =~ /\s\*{1,}\s{1,}Virus\b/)
> > where you are looing for a space followed by 1 or more *, 1 or more
> > spaces then Virus.
> >
> > Wags ;)
> >
> > >                         event($time, 'virus');
> > >                 }
> > >         }
> > >
> > > Is my new code correct? If no, how should it be?
> > >
> > > Regards,
> > > Alex
>
> There must be more going on here. The original regex matches at the
> beginning of the line--'^VIRUS'--so everything up to the space
> following the colon must be stripped before the regex in the if
> conditional gets it.  '$text =~ /^VIRUS/' doesn't match on 'Aug 7 blah
> blah blah'. In that case, simply replacing /^VIRUS/ with /^Virus/
> should work fine. or better yet:
>
> if ( $text =~ /^virus/i ) {
>
> I think we need to see more of the code, though, to be sure of what's going
> on.
>
> HTH,
>
> --jay
> --------------------------------------------------
> This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
> private and confidential
>
> daggerquill [at] gmail [dot] com
> http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org


Yes indeed.... here comes more explanations:

1. Because i don't have old vexira software, i will post here an working 
spamassasin example, which is almost identical to old vgatefwd functionality. 
Here comes mailgraph.pl code.

sub process_line($)
{
        my $sl = shift;
        my $time = $sl->[0];
        my $prog = $sl->[2];
        my $text = $sl->[4];

 if($prog =~ /^postfix\/(.*)/) {
 ........
 }
 
        elsif($prog eq 'vagatefwd') {
                # Vexira antivirus
                if($text =~ /^VIRUS/) {
                        event($time, 'virus');
                }
        }

        elsif($prog eq 'spamd') {
                if($text =~ /^identified spam/) {
                        event($time, 'spam');
                }
        }
}

in my /var/log/maillog, when and email is tagged as spam comes:

Aug  7 05:13:29 pharma spamd[11623]: identified spam (15.3/5.0) for 
[EMAIL PROTECTED]:12347 in 1.7 seconds, 19833 bytes.
Aug  7 05:13:29 pharma spamd[11623]: result: Y 15 - bla bla bla

2. Now, when a message is infected, vexira put in my /var/log/maillog the 
following lines:

Aug  7 13:40:28 pharma hook[2446]: Virus scanning in attachment 
'noname_1248.txt'
Aug  7 13:40:28 pharma hook[2446]: Scanning object...
Aug  7 13:40:28 pharma hook[2446]: Virus Scanning in: $noname_1248.txt
Aug  7 13:40:28 pharma hook[2446]: Callback entry point
Aug  7 13:40:28 pharma hook[2446]: Callback: scanFound
Aug  7 13:40:28 pharma hook[2446]: ***** Found mutant 'Exploit.IFrame.B' -
Killable with delete.
Aug  7 13:40:28 pharma hook[2446]: Callback entry point
Aug  7 13:40:28 pharma hook[2446]: Callback: actionDone
Aug  7 13:40:28 pharma hook[2446]: Callback: actionDone = 2
Aug  7 13:40:28 pharma hook[2446]: ***** Virus (Exploit.IFrame.B) killed with 
file delete!
Aug  7 13:40:28 pharma hook[2446]: Virus Scanning done in: $noname_1248.txt
Aug  7 13:40:28 pharma hook[2446]: Object scanning done...
Aug  7 13:40:28 pharma hook[2446]: Generated virus toplist 'Daily'

So, interesting lines in /var/log/maillog should contain "***** Virus" string 
and are generated by 'hook' daemon.

3. Now, at the beginig at mailgraph.pl, is defined who is $prog and $text:

sub _next_syslog($)
{
    my ($self) = @_;
    while($self->{_repeat}>0) {
        $self->{_repeat}--;
        return $self->{_repeat_data};
    }
    line: while(my $str = $self->_next_line()) {
        # date, time and host
        $str =~ /^
            (\S{3})\s+(\d+)   # date  -- 1, 2
            \s
            (\d+):(\d+):(\d+) # time  -- 3, 4, 5
            (?:\s<\w+\.\w+>)? # FreeBSD's verbose-mode
            \s
            ([-\w\.]+)        # host  -- 6
            \s+
            (.*)              # text  -- 7
            $/x or do
        {
            warn "WARNING: line not in syslog format: $str";
            next line;
        };

 ........
        # marks
        next if $text eq '-- MARK --';
        # some systems send over the network their
        # hostname prefixed to the text. strip that.
        $text =~ s/^$host\s+//;
        # discard ':' in HP-UX 'su' entries like this:
        # Apr 24 19:09:40 remedy : su : + tty?? root-oracle
        $text =~ s/^:\s+//;
        $text =~ /^
            ([^:]+?)        # program   -- 1
            (?:\[(\d+)\])?  # PID       -- 2
            :\s+
            (?:\[ID\ (\d+)\ ([a-z0-9]+)\.([a-z]+)\]\ )?   # Solaris 8 "message
id" -- 3, 4, 5
            (.*)            # text      -- 6
            $/x or do
        {
            warn "WARNING: line not in syslog format: $str";
            next line;
        };
        if($self->{arrayref}) {
            $self->{_last_data}{$host} = [
                $time,  # 0: timestamp
                $host,  # 1: host
                $1,     # 2: program
                $2,     # 3: pid
                $6,     # 4: text
                ];
        }
        else {
            $self->{_last_data}{$host} = {
                timestamp => $time,
                host      => $host,
                program   => $1,
                pid       => $2,
                msgid     => $3,
                facility  => $4,
                level     => $5,
                text      => $6,
            };
        }
        return $self->{_last_data}{$host};
 }
    return undef;
}


Finally, i want to mention that the code posted by Wags is not working:

        elsif($prog eq 'hook') {
                # Vexira antivirus
                if($text =~ /\s\*{1,}\s{1,}Virus\b/) {
                        event($time, 'virus');
                }
        }

Also, correction posted by Jay, is not applicabile because we have more lines 
containing "Virus" word and only one should be counted (***** Virus).

Alex

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: perl regexp need help!

Reply via email to