Dear All, I am just learning how to use R programming. I want to extract reviews from a page and loop till I extract for all pages:
#specify the first page URL fpURL <- 'https://wordpress.org/support/plugin/easyrecipe/reviews/' #read the HTML contents in the first page URL contentfpURL <- read_html(fpURL) #identify the anchor tags in the first page URL fpAnchors <- html_nodes(contentfpURL, css='a.bbp-topic-permalink') #extract the HREF attribute value of each anchor tag fpHREF <- html_attr(fpAnchors, 'href') #create empty lists to store titles & contents found in the HREF attribute value of each anchor tag titles = c() contents = c() #loop the following actions for each HREF found firstpage for (u in fpHREF) { #read the HTML content of the review page fpURL = read_html(u) #identify the title anchor and read the title text fpreviewT = html_text(html_nodes(fpURL, css='h1.page-title')) #identify the content anchor and read the content text fpreviewC = html_text(html_nodes(fpURL, css='div.bbp-topic-content')) #store the review titles and contents in the previous lists titles = c(titles, fpreviewT) contents = c(contents, fpreviewC) } #identify the anchor tag pointing to the next summary page npAnchor <- html_text(html_node(contentfpURL, css='a.next page-numbers')) #extract the HREF attribute value of the anchor tag pointing to the next summary page npHREF <- html_attr(npAnchor, 'href') #loop the following actions for every next summary page HREF attribute for (u in npHREF) { #specify the URL of the summary page spURL <- read_html('npHREF') #identify all the anchor tags on that summary page spAnchors <- html_nodes(spURL, css='a.bbp-topic-permalink') #extract the HREF attribute value of each anchor tag spHREF <- html_attr(spAnchors, 'href') #loop the following actions for each HREF found on that summarypage for (u in fpHREF) { #read the HTML contents of the review page spURL = read_html(u) #identify the title anchor and read the title text spreviewT = html_text(html_nodes(spURL, css='h1.page-title')) #identify the content anchor and read the content text spreviewC = html_text(html_nodes(spURL, css='div.bbp-topic-content')) #store the review titles and contents in the previous lists titles = c(titles, spreviewT) contents = c(contents, spreviewC) } } I got stuck at the step to extract the HREF attribute value of the anchor tag pointing to the next summary page with the error: Error in UseMethod("xml_attr") : no applicable method for 'xml_attr' applied to an object of class "character" I will appreciate any help with this task. Thanks in advance. ---Tiffany ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.