clair.crossup...@googlemail.com wrote:
Dear R-help,
There seems to be a web page I am unable to download using RCurl. I
don't understand why it won't download:
library(RCurl)
my.url <-
"http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2"
getURL(my.url)
[1] ""
I like the irony that RCurl seems to have difficulties downloading an
article about R. Good thing it is just a matter of additional arguments
to getURL() or it would be bad news.
The followlocation parameter defaults to FALSE, so
getURL(my.url, followlocation = TRUE)
gets what you want.
The way I found this is
getURL(my.url, verbose = TRUE)
and take a look at the information being sent from R
and received by R from the server.
This gives
* About to connect() to www.nytimes.com port 80 (#0)
* Trying 199.239.136.200... * connected
* Connected to www.nytimes.com (199.239.136.200) port 80 (#0)
> GET /2009/01/07/technology/business-computing/07program.html?_r=2
HTTP/1.1
Host: www.nytimes.com
Accept: */*
< HTTP/1.1 301 Moved Permanently
< Server: Sun-ONE-Web-Server/6.1
< Date: Mon, 26 Jan 2009 16:10:51 GMT
< Content-length: 0
< Content-type: text/html
< Location:
http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html&OQ=_rQ3D3&op=42fceb38q2fq5duarq5d3-z8q26--q24jq5djccq7bq5dcmq5dc1q5dq24...@-f-q2anq5dry8h@a88q3dz-dbyq...@q2aq5dc1bq26-q2aq26q5bddfq24df
<
And the 301 is the critical thing here.
D.
Other web pages are ok to download but this is the first time I have
been unable to download a web page using the very nice RCurl package.
While i can download the webpage using the RDCOMClient, i would like
to understand why it doesn't work as above please?
library(RDCOMClient)
my.url <-
"http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2"
ie <- COMCreate("InternetExplorer.Application")
txt <- list()
ie$Navigate(my.url)
NULL
while(ie[["Busy"]]) Sys.sleep(1)
txt[[my.url]] <- ie[["document"]][["body"]][["innerText"]]
txt
$`http://www.nytimes.com/2009/01/07/technology/business-computing/
07program.html?_r=2`
[1] "Skip to article Try Electronic Edition Log ...
Many thanks for your time,
C.C
Windows Vista, running with administrator privileges.
sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
other attached packages:
[1] RDCOMClient_0.92-0 RCurl_0.94-0
loaded via a namespace (and not attached):
[1] tools_2.8.1
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.