Hi, I'm trying to use rvest to scrape a page and I am having difficulty excluding child element superscripts via a CSS selector. For example, here I've read the html and selected nodes.
p <- read_html(targetUrl) p %>% html_nodes("td.xyz") The result looks something like this: {xml_nodeset (20)} [1] <td class="xyz" width="50%">Foo<font size="-1"><sup>9</sup></font>:</td> [2] <td class="xyz" width="50%">Bar<font size="-1"><sup>3</sup></font>:</td> [...] I would like to extract the words "Foo" and "Bar" without the superscripts by passing along to html_text(). I thought something like this would work, but it returns just the superscripts. p %>% html_nodes("td.xyz") %>% html_nodes(":not(sup)") %>% html_text() Perhaps I’m using the not selector improperly. Any suggestions on how to get this to work properly? Thanks. James ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.