Hello, I have HTML input to which I apply some changes.
Feature 1: ======= I want to tranform all the text, but if the text is inside an "a href" tag, I want to leave the text as it is. The HTML is not necessarily well-formed, so I would like to do that using BeautifulSoup (or maybe another tolerant parser). As a test case, suppose I want to uppercase all the text except the text that is within "a href" tags: ExampleString = """ <footag>Lorem Ipsum</footag> is simply dummy text of <a href="junk.html">the printing</a> and <a href="junk2.html">typesetting <b>industry</b>.</a> Thanks.""" When applying the text transform, I want to obtain: <footag>LOREM IPSUM</footag> IS SIMPLY DUMMY TEXT OF <a href="junk.html">the printing</a> AND <a href="junk2.html">typesetting <b>industry</b>.</a> THANKS.""" Feature 2: ======== Another thing I may want to do: If the text I would normally transform is inside an "a href" tag, then do not transform it, but insert the result of text transformation just after the "</a>". Using the same example as input, application of this feature2 would give something like that: <footag>LOREM IPSUM</footag> IS SIMPLY DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE PRINTING</feat2> AND <a href="junk2.html">typesetting <b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2> THANKS.""" ======== Thanks for your help -- http://mail.python.org/mailman/listinfo/python-list