I have the following code to extract certain links from a webpage: from bs4 import BeautifulSoup import urllib2, sys import re
def tonaton(): site = "http://tonaton.com/en/job-vacancies-in-ghana" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) invalid_tag = ('h2') soup = BeautifulSoup(jobpass) print soup.find_all('h2') The links are contained in the 'h2' tags so I get the links as follows: <h2><a href="/en/cashiers-accra">cashiers </a></h2> <h2><a href="/en/cake-baker-accra">Cake baker</a></h2> <h2><a href="/en/automobile-technician-accra">Automobile Technician</a></h2> <h2><a href="/en/marketing-officer-accra-4">Marketing Officer</a></h2> But I'm interested in getting rid of all the 'h2' tags so that I have links only in this manner: <a href="/en/cashiers-accra">cashiers </a> <a href="/en/cake-baker-accra">Cake baker</a> <a href="/en/automobile-technician-accra">Automobile Technician</a> <a href="/en/marketing-officer-accra-4">Marketing Officer</a> I therefore updated my code to look like this: def tonaton(): site = "http://tonaton.com/en/job-vacancies-in-ghana" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) invalid_tag = ('h2') soup = BeautifulSoup(jobpass) jobs = soup.find_all('h2') for tag in invalid_tag: for match in jobs(tag): match.replaceWithChildren() print jobs But I couldn't get it to work, even though I thought that was the best logic i could come up with.I'm a newbie though so I know there is something better that could be done. Any help will be gracefully appreciated Thanks -- https://mail.python.org/mailman/listinfo/python-list