Hello  All --

I'm new to this forum, and hope someone can help me with a seemingly simple
problem.

I am reading in a tagged text file, and isolating my efforts to a particular
field within. I need to analyze each line ,determine if I have a certain
matching string, and if I do, combine them and remove the previous two while
leaving any unmatched lines alone.

Enough history, here's what I'm getting.

Input example:

!EC
1999 TNT 230-4
!CU
Administrative Rulings
!CU
Administrative Rulings
!CS
IRS Revenue Rulings
!DN
Doc 1999-37669 (3 original original pages)
!TS  #each of the following pairs should be combined because the <LNK ...
strings match
Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
!GI
United States
!IA
Internal Revenue Service
!DP
30 Nov 1999
!PD
01 Dec 1999
. . .
=========================================

Output example:

!EC
1999 TNT 230-4
!CU
Administrative Rulings
!CU
Administrative Rulings
!CS
IRS Revenue Rulings
!DN
Doc 1999-37669 (3 original original pages)
!TS  #notice that the first of the two lines remain, and the second is properly
removed
     #the lines are also combined as I want them. The 'old' and 'new' tags are
for my viewing
old Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
new Modified and Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
old Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
new Superseded and Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul.
2000-56</LNK>
!GI
United States
!IA
Internal Revenue Service
!DP
30 Nov 1999
!PD
01 Dec 1999
. . .

I think I'm very close here, but if anyone can help with my code, I would
appreciate it greatly.
Code follows (I realize that some is extranoeus):

$file = @ARGV[0];
$output = @ARGV[1];

open (INPUT, "$file") || die "Can't open $file: $!\n";
open (OUTPUT, ">$output") || die "Can't create $output: $!\n";

$tson = 0;


while (<INPUT>){

     (/!TS/) && ($tson = 1);
     (/!GI/||/!IA/||/!DP/||/!IR/||/!HN/||/!CF/) && ($tson = 0);
     if ($tson == 1){
          $ts = $_;

          if (/((\w+) by (<LNK\:.+>.+<\/LNK>))/){
               $next_whole = $1;
               $next = $2;
               $comp_link = $3;
               s/$_//g;

               if (($first ne $next) && ($link eq $comp_link) && ($next ne '')){
                    print (OUTPUT "new $first and $next by $link\n");
                    #print ("new $first and $next by $link\n");
                    s/$_//g;
                    s/$whole//;
                    $whole = $next_whole;
                    s/$next_whole//;
                    #$next_whole = '';
                    $comp_link = '';
                    $ts = '';
                    $first = '';
                    #$link = '';
                    $next = '';

               }

               else{
                    s/$_//g;
                    print (OUTPUT "old $next_whole\n");
                    print ("old $next_whole\n");
                    $next = '';
                    $comp_link = '';
                    $next_whole = '';
                    $whole = '';
               }

          }
          if ($ts =~ /((\w+) by (<LNK\:.+>.+<\/LNK>))/){
               $whole = $1;
               $first = $2;
               $link = $3;
               #$next_whole = '';
               $ts = '';
               s/$_//g;
          }
     }
print OUTPUT;
}

Thanks very much, I hope this post is appropriate.

Paul Binkley
Tax Analysts
[EMAIL PROTECTED]


Reply via email to