On Sunday, November 6, 2016 at 1:27:48 AM UTC-4, rosef...@gmail.com wrote: > Considering the following html: > > <h2 id="example">cool stuff</h2> <ul> <li>hi</li> </ul> <div> <h2 > id="cool"><h2> <ul><li>zz</li> </ul> </div> > > and the following list: > > ignore_list = ['example','lalala'] > > My goal is, while going through the HTML using Beautifulsoup, I find a h2 > that has an ID that is in my list (ignore_list) I should delete all the ul > and lis under it until I find another h2. I would then check if the next h2 > was in my ignore list, if it is, delete all the ul and lis until I reach the > next h2 (or if there are no h2s left, delete the ul and lis under the current > one and stop). > > How I see the process going: you read all the h2s from up to down in the DOM. > If the id for any of those is in the ignore_list, then delete all the ul and > li under the h2 until you reach the NEXT h2. If there is no h2, then delete > the ul and LI then stop. > > Here is the full HMTL I am trying to work with: http://pastebin.com/Z3ev9c8N > > I am trying to delete all the UL and lis after "See_also"How would I > accomplish this in Python?
I got it working with the following solution: #Remove content I don't want try: for element in body.find_all('h2'): current_h2 = element.get_text() current_h2 = current_h2.replace('[edit]','') #print(current_h2) if(current_h2 in ignore_list): if(element.find_next_sibling('div') != None): element.find_next_sibling('div').decompose() if(element.find_next_sibling('ul') != None): element.find_next_sibling('ul').decompose() except(AttributeError, TypeError) as e: continue -- https://mail.python.org/mailman/listinfo/python-list