Hello, All:
Thanks to Rasmus Liland, William Michels, and Luke Tierney with
my earlier web scraping question. With their help, I've made progress.
Sadly, I still have a problem: One field has "<br/>", which gets
suppressed by XML::readHTMLTable:
sosURL <-
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
sosChars <- RCurl::getURL(sosURL)
MOcan <- XML::readHTMLTable(sosChars)
MOcan[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"
(Seneca <- regexpr('SENECA', sosChars))
substring(sosChars, Seneca-22, Seneca+14)
[1] "4476 FIVE MILE RD<br/>SENECA MO 64865"
How can I get essentially the same result but without having
XML::readHTMLTable suppress "<br/>"?
NOTE: I get something very similar with xml2::read_html and
rvest::html_table:
sosPointers <- xml2::read_html(sosChars)
MOcan2 <- rvest::html_table(sosPointers)
MOcan2[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"
MOcan2 does not have names, and some of the fields are
automatically converted to integers, which I think is not smart in this
application.
Thanks,
Spencer Graves
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.