I am writing a script to convert RTF to XML, and my output looks like
this: (I explain this ugliness below)

<id1listlevel1> 
text
<id1listlevel2>
text
</id1listlevle2>
</id1listlevel1>
text
<id1listlevel1> 
text
<id1listlevel2>
text
</id1listlevle2>
</id1listlevel1>

I know that is ugly to read, but I'm just point out that the tags
repeat themselves. It should look like this:

<id1listlevel1> 
text
<id1listlevel2>
text
text
text
text
</id1listlevle2>
</id1listlevel1>

In other words, the list starts and then stops in the middle. In order
to get rid of these exessive tags , I was thinking of reading the
whole file into memory at once, and then doing this substitution:

my @array = split /(<id1listlevel1>(.*)<I\/id1listlevel1>/, $_;
for my $name(@array){
        if ($name =~/<id1listlevel1>/){
                $name=~s/<id1listlevel1//g;
                $name= "<id1listlevel1>$name<id1listlevel1>";
        }
        print $name;
}

my @array = split /(<id1listlevel2>(.*)<I\/id1listlevel2>/, $_;
....

I would actually use a loop for each level.

However, isn't it a bad idea to read the whole file in at once? What
happens if the user had a really huge file? 

My other method was to read my result  file one line at a time. Once I
found <id1listlevel1> or anything that matches a similar pattern, I
would push it into an array. If I found it again, I would simply
delete it.  Then, read the file in backwards one line at a time, and
look for the pattern </id1listlevel1>, and allow only the first one of
these.

So, should I slurp or do it one line at a time?

In case you are wondering why my output has extra tags in the middle,
you can blame good ol Bill Gates. RTF really does suck. In earlier
versions of RTF, the code told you when the user was skipping an item
in a list (but was still continuing the list). For a reason only known
to the morons at micro$oft, they changed this code in word 97 and
2000. 

Thanks

Paul





-- 
************************
*Paul Tremblay         *
*[EMAIL PROTECTED]*
************************

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to