Hi Philip, If i understood correctly, you just wish to get the urls from a given google search? I have some old code you could adapt which extracts the main links from a google search. It makes use of XPath expressions using the lovely XML and RCurl packages:
> library(XML) > library(RCurl) > > getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) { + search.term <- gsub(' ', '%20', search.term) + if(quotes) search.term <- paste('%22', search.term, '%22', sep='') + getGoogleURL <- paste('http://www.google', domain, '/search?q=', search.term, sep='') + } > > getGoogleLinks <- function(google.url) { + doc <- getURL(google.url, httpheader = c("User-Agent" = "R (2.10.0)")) + html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function (...){}) + ## the next line is very important to parse the html ## + nodes <- getNodeSet(html, "//a...@href][@class='l']") + return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]])) + } > > > search.term <- "cran" > quotes <- "FALSE" > > search.url <- getGoogleURL(search.term=search.term, quotes=quotes) > > links <- getGoogleLinks(search.url) > links [1] "http://cran.r-project.org/" "http://cran.r- project.org/web/packages/" "http://www.cranmusic.com/" "http://www.sizes.com/units/cran.htm" [5] "http://www.r-project.org/" "http://www.myspace.com/ cranmusic" "http://www.rozcran.co.uk/" "http:// www.cherylcran.com/" [9] "http://www.chriscran.com/" "http:// www.cranhillranch.com/" "http://www.yumsugar.com/ 6262265" "http://www.yumsugar.com/6262259" Hope that helps a little, Tony Breyal On 16 Nov, 19:29, Philip Leifeld <leif...@coll.mpg.de> wrote: > Hi, > > how can I parse Google search results? The following code returns > "integer(0)" instead of "1" although the results of the query clearly > contain the regex "cran". > > #### > address <- url("http://www.google.com/search?q=cran") > open(address) > lines <- readLines(address) > grep("cran", lines[3]) > #### > > Thanks > > Philip > > -- > Philip Leifeld > Max Planck Institute for | +49 (0) 1577 6830349 (mobile) > Research on Collective Goods | +49 (0) 228 91416-73 (phone) > MaxNetAging Doctoral Fellow | +49 (0) 228 91416-62 (fax) > Kurt-Schumacher-Str. 10 | > 53113 Bonn, Germany |http://www.philipleifeld.de > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.