Hello, i am looking for an idea on how to handle un-nesting tags.
i know i can use something build on top of a htmltidy, but i'm rather wondering if this could be done using only python standard library. my input tags can not be crossed (i mean "<a> w1 <b> w2 </a> w3 </b>" is impossible from my input) actually i had produced some data with : some input : (line number / content) 0 <a> 1 <b> 2 <c> 3 w1 4 w2 5 </a> 6 w3 7 <d> 8 w4 9 </b> 10 </d> 11 </c> where in fact i should i have : 0 <b> 1 <c> 2 <a> 3 w1 4 w2 5 </a> 6 w3 7 <d> 8 w4 9 </d> 10 </c> 11 </b> i am wondering how i can repair that. i had built a small script which already do that, but as i know there are clever brains here, may be i will get some better suggestions... (i need to clean/rewrite my code, but here is how it works : it first find paired opening/closing tags, their width and positions, then from the smallest to the largest, it encloses the previous text inside the current tag and build a text that will be the next one to be enclosed and so on.) -- http://mail.python.org/mailman/listinfo/python-list