On Fri, 20 Mar 2015 00:18:33 -0700, Sayth Renshaw wrote: > Just finding it odd that the next sibling is a "\n" and not the next > <td> otherwise that would be the perfect solution.
Whitespace between elements creates a node in the parsed document. This is correct, because whitespace between elements will be interpreted as whitespace by a browser. <a href="blah1">text1</a><a href="blah2">text2</a> will be displayed differently to <a href="blah1">text1</a> <a href="blah2">text2</a> in a browser, because the space between the <a> two elements in the second case is a text node in the dom. A newline has the same effect (because to a browser for display purposes it's just whitespace) but in the dom the text node will contain the newline rather than a space. bs4 tries to parse the html the same way a browser does, so you get all the text nodes, including the whitespace between elements which includes any newlines. -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list