On Sunday, November 6, 2016 at 1:27:48 AM UTC-4, rosef...@gmail.com wrote:
> Considering the following html:
> 
>     <h2 id="example">cool stuff</h2> <ul> <li>hi</li> </ul> <div> <h2 
> id="cool"><h2> <ul><li>zz</li> </ul> </div>
> 
> and the following list:
> 
>     ignore_list = ['example','lalala']
> 
> My goal is, while going through the HTML using Beautifulsoup, I find a h2 
> that has an ID that is in my list (ignore_list) I should delete all the ul 
> and lis under it until I find another h2. I would then check if the next h2 
> was in my ignore list, if it is, delete all the ul and lis until I reach the 
> next h2 (or if there are no h2s left, delete the ul and lis under the current 
> one and stop). 
> 
> How I see the process going: you read all the h2s from up to down in the DOM. 
> If the id for any of those is in the ignore_list, then delete all the ul and 
> li under the h2 until you reach the NEXT h2. If there is no h2, then delete 
> the ul and LI then stop.
> 
> Here is the full HMTL I am trying to work with: http://pastebin.com/Z3ev9c8N
> 
> I am trying to delete all the UL and lis after "See_also"How would I 
> accomplish this in Python?


I got it working with the following solution:

#Remove content I don't want
            try:
                for element in body.find_all('h2'):
                    current_h2 = element.get_text()
                    current_h2 = current_h2.replace('[edit]','')
                    #print(current_h2)
                    if(current_h2 in ignore_list):
                        if(element.find_next_sibling('div') != None):
                            element.find_next_sibling('div').decompose()
                        if(element.find_next_sibling('ul') != None):
                            element.find_next_sibling('ul').decompose()
            except(AttributeError, TypeError) as e:
                continue   
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to