I learned something new and thought you may be interested.
Regards, Brihas
Problem:
I have a text file in UNIX format (i.e. UNIX EOL) to which I would like to do the following:
FOR {each blank line} WHERE {the subsequent line starts with a digit (i.e. 1-9)},
{add a % to the blank line} AND {delete the subsequent line (i.e. the one starting with a digit)}
=======================================
Solution # 1
perl -i.bak -ne'/\S/?print:(($a=<>)=~/^\d/)?print"%$_":do{print;$_=$a;redo}' yourfile
My Comments: Works as advertised, but I don't understand what all the code means :(
=======================================
Solution # 2
bash programs make heavy use of awk and sed (and other little languages and utilities) to do what they do. bash on it's own is pretty weak but it's power is that it integrates so well with other shell utilities. This could have been done with sed. However, Perl's "s" flag for regular expressions, which tells it to treat new lines just as any other character when matching, makes it easy to do this in Perl. I've attached a little script. This is the heart of it:
1 while $buf =~ s{\n\n([0-9][^\n]+)\n}{\n%\n};
The "1 while /pattern/" idiom is great if you have a nested or overlapping pattern but your example does not. The character class [0-9] could have been written as \d and the character class [^\n]+ could have been written as .+. So to sum up that line could have been written as:
$buf =~ s/(?<=\n)\n(\d.+)/%/g;
Normally this would be written without the "1 while", but the next match could (and often does) start in the text that was substituted in, so this will try again from the start after each successful match and finally stop after it fails to find any match.
I'm using the { } characters to bracket the matching pattern and the replacement text portitions of the regular expression to make it more readable. This could have been written like s/// instead of s{}{}.
The matching pattern is the newline, a digit (0-9, which could also
have been written as \d, but in this example written as [0-9]), a span of any
number of any character except the newline (written as [^\n]+), and then
two newlines (\n\n) in a row which indicates a blank line (the end of
one line which probably had something on it and then another line end
immediately without anything on it).
When that matching pattern is found, the text \n%\n (\n translates to a newline) is substitued in.
The rest of the program is just reading the first argument, doing file I/O, and error handling (the "or die" clauses). Error handling tries to insure that the original file isn't overwritten if something goes wrong and other bad scenarios I could think of. To be more user friendly, the print statement could have had an "or die" on it to detect a suddenly full disc and other scenarios =)
<--START perl code--> #!/usr/bin/perl
use strict; use warnings;
=begin comment
FOR {each blank line} WHERE {the subsequent line starts with a digit (i.e. 1-9)}, {add a % to the blank line} AND {delete the subsequent line (i.e. the one starting with a digit)}
=cut
my $fn = shift @ARGV; # first command-line argument
open my $f, '<', $fn or die "usage: $0 <infile>"; read $f, my $buf, -s $f or die "could not read from file $fn: $!"; close $f; rename $fn, "$fn.orig" or die "could not rename $fn to $fn.orig: $!"; open $f, '>', $fn or die "could not open $fn for writing: $!";
1 while $buf =~ s{\n\n([0-9][^\n]+)\n}{\n%\n};
print $f $buf; close $f; <--END perl code-->
My Comments: Works as advertised, and I got to learn some Perl! :) =======================================
Are you related to Gurusamy Sarathy at ActiveState?
John -- use Perl; program fulfillment
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>