Hi, Jerry, :)

On Tue, 10 Dec 2002, Jerry Rocteur wrote:

> Anyone know how to write a grep -C in Perl ?

Here is a script that mimics just the '-C' feature of GNU grep.  It
doesn't support the fanciness of printing filenames when multiple
input files are specified.

This may be overkill for your situation, but you can specify an
arbitrary number of context lines.  Also, this script (unlike all
others mentioned, I believe) will never print duplicate lines in case
the context of two matches overlaps.

This was a fun problem.  I'm sure it can be done in more efficient
ways.  I'd love to see another's solution to this.

If you have any questions, don't hesitate to ask. :)

Enjoy!

---Jason



#!/usr/bin/perl

# We're trying to mimic 'grep -C', which prints a certain number of
# context lines around each match.  So, if a match occurred on line 10
# in a file invoked with 'grep -C 5', then the user would see lines 5
# thru 15 on their terminal (5 lines before, the match line, and 5
# lines after).
#
# However, grep -C *never* prints the same line twice, so if two
# matches occur within the "context", then grep -C is smart enough to
# not reprint any of the previous match's context.  For example, if
# there was a match on line 10 & on line 12, then grep -C would print
# lines 5 thru 17.  Cute, huh?
#
# Invoke with:
#
# grep-c.pl [--debug] [--context NUM] regex file1 file2...
#

use strict;
use warnings;
use Getopt::Long;

# Set to true if you want debugging output.
my $debug = 0; # Default.

# Number of context lines to print.
my $num_ctxt = 2; # Default.

GetOptions( 'debug' => \$debug, 'context=i' => \$num_ctxt )
   or die "$!\n";

# User passes the regex on the command line.
my $re = shift @ARGV or die "Need regex!\n";

# Unprinted lines that we have already seen.
my @back_context;

# "Future" lines that we have yet to check against the regex.  These
# lines will already have been printed.
my @next_lines;

# Current line.
my $line;

# If there are future lines that we've printed but haven't yet tested
# against our regex, then read from that list.  Else go to the file
# itself and request the next line.
while( @next_lines or defined($_ = <>) ) {
   print "At loop: \$_ = $_" if $debug;

   # The logic is this: @back_context only stores previous lines that
   # we have not yet printed.  Since all of the lines in @next_lines
   # are printed when we have a match, we don't update @back_context
   # when reading lines from @next_lines.
   if( @next_lines ) {
      $line = shift @next_lines;
   } else {
      $line = $_;
      push @back_context, $line;
      shift @back_context if $#back_context > $num_ctxt;
   }

   print "\$line = $line, #next_lines = " . @next_lines .
         ", #back_context = " . @back_context . "\n"
            if $debug;

   next unless $line =~ /$re/;

   # Unprinted lines that we have yet to see.  We want to never print
   # duplicate lines, so @forward_context only holds lines that are
   # later in the file than @next_lines (which have already been
   # printed).
   my @forward_context;

   # Lookahead so that we have at least $num_ctxt lines ahead in the
   # file.  (If the user didn't want any context, $num_ctxt will be 0
   # and we'll have nothing to print.)
   while( $num_ctxt && defined($_ = <>) ) {
      print "Adding to \@next: $_" if $debug;
      push @next_lines, $_;
      push @forward_context, $_;
      last if @next_lines >= $num_ctxt;
   }

   print foreach @back_context, @forward_context;

   # There are no more unprinted previously seen lines, since we just
   # printed them all!
   undef @back_context;

   # Are there any lines left in the file?
   exit unless defined $_;
}


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to