On Thursday, August 15, 2002, at 07:27 , Paul Tremblay wrote:

> I am writing a script to convert RTF to XML, and my output looks like
> this: (I explain this ugliness below)
>
> <id1listlevel1>
> text
> <id1listlevel2>
> text
> </id1listlevle2>
> </id1listlevel1>
> text
> <id1listlevel1>
> text
> <id1listlevel2>
> text
> </id1listlevle2>
> </id1listlevel1>
>
> I know that is ugly to read, but I'm just point out that the tags
> repeat themselves. It should look like this:
>
> <id1listlevel1>
> text
> <id1listlevel2>
> text
> text
> text
> text
> </id1listlevle2>
> </id1listlevel1>


actually let's start with the first part of the problem I see here
by redefining that first list so that it is more obvious

<dict>
        <id1listlevel1>
                text_1
                <id1listlevel2> text_2</id1listlevle2>
        </id1listlevel1>

        text_3

        <id1listlevel1>
                text_4
                <id1listlevel2> text_5 </id1listlevle2>
        </id1listlevel1>
</dict>

here you will notice that 'text_3' is "outside" of
any "tag" structure you offered, and is only a part
of the 'dictionary' itself - hence your 'should look like'
structure - maybe what you want - but it is not what the
original showed up - although you might have wanted it
that way - could have been a typo....

{ sometimes white space can be your friend... 8-) }

so let us assume for the moment that what you meant
was that it really did have say the structure

<dict>
        <id1listlevel1>
                text_1
                <id1listlevel2> text_2</id1listlevle2>
        </id1listlevel1>
        <id1listlevel1>
                <id1listlevel2> text_3</id1listlevle2>
        </id1listlevel1>
        <id1listlevel1>
                text_4
                <id1listlevel2> text_5 </id1listlevle2>
        </id1listlevel1>
</dict>

then you might get towards what you want... but
you still have problems with the 'text_4' being
at level1... not at level2....

So a part of the problem that you have is the simple
how to do a 'look ahead/look behind' problem....

your "SLURP" idea has the virtue that you can walk
around the whole of the text in some large @all_our_lines....

rather than walk into the text line by line and build
up the appropriate tree structure and then prune the
tree to what you want....



ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to