I am using Beautiful Soup to parse a html to find all text that is Not contained inside any anchor elements
I came up with this code which finds all links within href but not the other way around. How can I modify this code to get only plain text using Beautiful Soup, so that I can do some find and replace and modify the soup? for a in soup.findAll('a',href=True): print a['href'] Example: <html><body> <div> <a href="www.test1.com/identify">test1</a> </div> <div><br></div> <div><a href="www.test2.com/identify">test2</a></div> <div><br></div><div><br></div> <div> This should be identified Identify me 1 Identify me 2 <p id="firstpara" align="center"> This paragraph should be<b> identified </b>.</p> </div> </body></html> Output: This should be identified Identify me 1 Identify me 2 This paragraph should be identified. I am doing this operation to find text not within `<a></a>` : then find "Identify" and do replace operation with "Replaced" So the final output will be like this: <html><body> <div> <a href="www.test1.com/identify">test1</a> </div> <div><br></div> <div><a href="www.test2.com/identify">test2</a></div> <div><br></div><div><br></div> <div> This should be identified Repalced me 1 Replaced me 2 <p id="firstpara" align="center"> This paragraph should be<b> identified </b>.</p> </div> </body></html> Thanks for your time and help ! -- http://mail.python.org/mailman/listinfo/python-list