Hello All --
I'm new to this forum, and hope someone can help me with a seemingly simple
problem.
I am reading in a tagged text file, and isolating my efforts to a particular
field within. I need to analyze each line ,determine if I have a certain
matching string, and if I do, combine them and remove the previous two while
leaving any unmatched lines alone.
Enough history, here's what I'm getting.
Input example:
!EC
1999 TNT 230-4
!CU
Administrative Rulings
!CU
Administrative Rulings
!CS
IRS Revenue Rulings
!DN
Doc 1999-37669 (3 original original pages)
!TS #each of the following pairs should be combined because the <LNK ...
strings match
Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
!GI
United States
!IA
Internal Revenue Service
!DP
30 Nov 1999
!PD
01 Dec 1999
. . .
=========================================
Output example:
!EC
1999 TNT 230-4
!CU
Administrative Rulings
!CU
Administrative Rulings
!CS
IRS Revenue Rulings
!DN
Doc 1999-37669 (3 original original pages)
!TS #notice that the first of the two lines remain, and the second is properly
removed
#the lines are also combined as I want them. The 'old' and 'new' tags are
for my viewing
old Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
new Modified and Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
old Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
new Superseded and Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul.
2000-56</LNK>
!GI
United States
!IA
Internal Revenue Service
!DP
30 Nov 1999
!PD
01 Dec 1999
. . .
I think I'm very close here, but if anyone can help with my code, I would
appreciate it greatly.
Code follows (I realize that some is extranoeus):
$file = @ARGV[0];
$output = @ARGV[1];
open (INPUT, "$file") || die "Can't open $file: $!\n";
open (OUTPUT, ">$output") || die "Can't create $output: $!\n";
$tson = 0;
while (<INPUT>){
(/!TS/) && ($tson = 1);
(/!GI/||/!IA/||/!DP/||/!IR/||/!HN/||/!CF/) && ($tson = 0);
if ($tson == 1){
$ts = $_;
if (/((\w+) by (<LNK\:.+>.+<\/LNK>))/){
$next_whole = $1;
$next = $2;
$comp_link = $3;
s/$_//g;
if (($first ne $next) && ($link eq $comp_link) && ($next ne '')){
print (OUTPUT "new $first and $next by $link\n");
#print ("new $first and $next by $link\n");
s/$_//g;
s/$whole//;
$whole = $next_whole;
s/$next_whole//;
#$next_whole = '';
$comp_link = '';
$ts = '';
$first = '';
#$link = '';
$next = '';
}
else{
s/$_//g;
print (OUTPUT "old $next_whole\n");
print ("old $next_whole\n");
$next = '';
$comp_link = '';
$next_whole = '';
$whole = '';
}
}
if ($ts =~ /((\w+) by (<LNK\:.+>.+<\/LNK>))/){
$whole = $1;
$first = $2;
$link = $3;
#$next_whole = '';
$ts = '';
s/$_//g;
}
}
print OUTPUT;
}
Thanks very much, I hope this post is appropriate.
Paul Binkley
Tax Analysts
[EMAIL PROTECTED]