En Tue, 22 May 2007 03:02:34 -0300, Steven Bethard <[EMAIL PROTECTED]> escribió:
> I have some text and a list of Element objects and their offsets, e.g.:: > > >>> text = 'aaa aaa aaabbb bbbaaa' > >>> spans = [ > ... (etree.Element('a'), 0, 21), > ... (etree.Element('b'), 11, 18), > ... (etree.Element('c'), 18, 18), > ... ] > > I'd like to produce the corresponding ElementTree. So I want to write a > get_tree() function that works like:: > > >>> tree = get_tree(text, spans) > >>> etree.tostring(tree) > '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>' > > Perhaps I just need some more sleep, but I can't see an obvious way to > do this. Any suggestions? I need *some* sleep, but the idea would be as follows: - For each span generate two tuples: (start_offset, 1, end_offset, element) and (end_offset, 0, -start_offset, element). If start==end use (start_offset, -1, start_offset, element). - Collect all tuples in a list, and sort them. The tuple is made so when at a given offset there is an element that ends and another that starts, the ending element comes first (because it has a 0). For all the elements that end at a given point, the shortest comes first. - Initialize an empty list (will be used as a stack of containers), and create a root Element as your "current container" (CC), the variable "last_used" will keep the last position used on the text. - For each tuple in the sorted list: - if the second item is a 1, an element is starting. Insert it into the CC element, push the CC onto the stack, and set the new element as the new CC. The element data is text[last_used:start_offset], and move last_used to start_offset. - if the second item is a 0, an element is ending. Discard the CC, pop an element from the stack as the new CC. The element data is text[last_used:end_offset], move last_used up to end_offset. - if the second item is a -1, it's an element with no content. Insert it into the CC element. You can play with the way the tuples are generated and sorted, to get '<a>aaa aaa aaa<b>bbb bbb</b><c />aaa</a>' instead. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list