I am writing a script to convert RTF to XML, and my output looks like this: (I explain this ugliness below)
<id1listlevel1> text <id1listlevel2> text </id1listlevle2> </id1listlevel1> text <id1listlevel1> text <id1listlevel2> text </id1listlevle2> </id1listlevel1> I know that is ugly to read, but I'm just point out that the tags repeat themselves. It should look like this: <id1listlevel1> text <id1listlevel2> text text text text </id1listlevle2> </id1listlevel1> In other words, the list starts and then stops in the middle. In order to get rid of these exessive tags , I was thinking of reading the whole file into memory at once, and then doing this substitution: my @array = split /(<id1listlevel1>(.*)<I\/id1listlevel1>/, $_; for my $name(@array){ if ($name =~/<id1listlevel1>/){ $name=~s/<id1listlevel1//g; $name= "<id1listlevel1>$name<id1listlevel1>"; } print $name; } my @array = split /(<id1listlevel2>(.*)<I\/id1listlevel2>/, $_; .... I would actually use a loop for each level. However, isn't it a bad idea to read the whole file in at once? What happens if the user had a really huge file? My other method was to read my result file one line at a time. Once I found <id1listlevel1> or anything that matches a similar pattern, I would push it into an array. If I found it again, I would simply delete it. Then, read the file in backwards one line at a time, and look for the pattern </id1listlevel1>, and allow only the first one of these. So, should I slurp or do it one line at a time? In case you are wondering why my output has extra tags in the middle, you can blame good ol Bill Gates. RTF really does suck. In earlier versions of RTF, the code told you when the user was skipping an item in a list (but was still continuing the list). For a reason only known to the morons at micro$oft, they changed this code in word 97 and 2000. Thanks Paul -- ************************ *Paul Tremblay * *[EMAIL PROTECTED]* ************************ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]