Re: Perl Solution (was --> BASH scripting question)

John W. Krahn Wed, 19 Jan 2005 03:23:12 -0800

Brihas Sarathy wrote:

I learned something new and thought you may be interested.
Regards,
Brihas
Problem: I have a text file in UNIX format (i.e. UNIX EOL) to which I would like to do the following: FOR {each blank line} WHERE {the subsequent line starts with a digit (i.e. 1-9)}, {add a % to the blank line} AND {delete the subsequent line (i.e. the one starting with a digit)}

======================================= Solution # 1 perl -i.bak -ne'/\S/?print:(($a=<>)=~/^\d/)?print"%$_":do{print;$_=$a;redo}' yourfile

My Comments: Works as advertised, but I don't understand what all the code means :( ======================================= Solution # 2
bash programs make heavy use of awk and sed (and other little
languages and utilities) to do what they do. bash on it's own is
pretty weak but it's power is that it integrates so well with other
shell utilities. This could have been done with sed. However,
Perl's "s" flag for regular expressions, which tells it to treat
new lines just as any other character when matching, makes it
easy to do this in Perl. I've attached a little script. This
is the heart of it:
1 while $buf =~ s{\n\n([0-9][^\n]+)\n}{\n%\n};

The "1 while /pattern/" idiom is great if you have a nested or overlapping pattern but your example does not. The character class [0-9] could have been written as \d and the character class [^\n]+ could have been written as .+. So to sum up that line could have been written as:

$buf =~ s/(?<=\n)\n(\d.+)/%/g;

Normally this would be written without the "1 while", but the
next match could (and often does) start in the text that was
substituted in, so this will try again from the start after each successful
match and finally stop after it fails to find any match.

I'm using the { } characters to bracket the matching pattern and
the replacement text portitions of the regular expression to make
it more readable. This could have been written like s/// instead
of s{}{}.

The matching pattern is the newline, a digit (0-9, which could also have been written as \d, but in this example written as [0-9]), a span of any number of any character except the newline (written as [^\n]+), and then two newlines (\n\n) in a row which indicates a blank line (the end of one line which probably had something on it and then another line end immediately without anything on it).

When that matching pattern is found, the text \n%\n (\n translates
to a newline) is substitued in.

The rest of the program is just reading the first argument, doing file
I/O, and error handling (the "or die" clauses). Error handling tries to
insure that the original file isn't overwritten if something goes wrong
and other bad scenarios I could think of. To be more user friendly, the
print statement could have had an "or die" on it to detect a suddenly
full disc and other scenarios =)

<--START perl code-->
#!/usr/bin/perl

use strict;
use warnings;

=begin comment

FOR {each blank line} WHERE {the subsequent line starts with a digit
(i.e. 1-9)},
{add a % to the blank line} AND {delete the subsequent line (i.e. the
one starting with a digit)}

=cut

my $fn = shift @ARGV; # first command-line argument

open my $f, '<', $fn         or die "usage: $0 <infile>";
read $f, my $buf, -s $f  or die "could not read from file $fn: $!";
close $f;
rename $fn, "$fn.orig"   or die "could not rename $fn to $fn.orig: $!";
open $f, '>', $fn            or die "could not open $fn for writing: $!";

1 while $buf =~ s{\n\n([0-9][^\n]+)\n}{\n%\n};

print $f $buf;
close $f;
 <--END perl code-->

My Comments: Works as advertised, and I got to learn some Perl! :)
=======================================


Are you related to Gurusamy Sarathy at ActiveState?

John
--
use Perl;
program
fulfillment

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Perl Solution (was --> BASH scripting question)

Reply via email to