On 2020-07-17 07:54 -0400, Sam H wrote: | On 2020-07-17 09:30 +0100, ruipbarradas wrote: | | On 2020-07-16 20:59 -0500, luke-tier...@uiowa.edu wrote: | | | Às 08:45 de 15/07/20, Sam H escreveu: | | | | Hi, | | | | | | | | I am trying to download some | | | | data using read.csv and it works | | | | perfectly in RStudio and fails | | | | in the R console in the terminal | | | | in Ubuntu 18.04 after upgrading | | | | from R 3.6.3 to 4.0.2. | | | | | | On my Ubuntu system the download | | | with read.csv succeeds in an R | | | console if I set the HTTPUserAgent | | | and download.file.method options to | | | match the ones used by RStudio. | | | | | | Given how picky the server is being | | | I would worry about whether this use | | | is in line with the site's terms of | | | service. | | | | Yes, I thought it's a site policy | | issue too. But the file can be | | accessed and read/downloaded from | | RStudio and Firefox so apparently | | there's no reason why R console | | shouldn't . | | Hello, | | Thank you very much to you all to look into this. | | I came across this problem when I was using TTR::stockSymbols() ( | https://github.com/joshuaulrich/TTR/blob/e6609b9f7621f3a4b1a204c159af61aebc89997e/R/WebData.R) | . | | As a workaround I added this function | to my private R package and instead of | read.csv I am now using | data.table::fread() which properly | (without failing) downloads the file | and reads it.
Dear Sam, Good thing you solved this. Like Luke said, to use read.csv you need to set the HTTPUserAgent option: options("HTTPUserAgent"="User-Agent: RStudio Desktop (1.3.959)") ... or with cURL directly: rasmus@twentyfive ~ % curl -H 'User-Agent: RStudio Desktop (1.3.959)' 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download' Às 08:45 de 15/07/20, Sam H escreveu: | Before upgrading this worked in the R | console in the terminal also without | any issues. In version 3.6.3, I was not able to run the lines > R.Version()$version.string [1] "R version 3.6.3 (2020-02-29)" > options()[c("download.file.method", "HTTPUserAgent")] $<NA> NULL $HTTPUserAgent [1] "R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)" > x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download" > read.csv(x, as.is=TRUE, na="n/a") Error in file(file, "rt") : cannot open the connection to 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download' In addition: Warning message: In file(file, "rt") : cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download': HTTP status was '403 Forbidden' > Running data.table::fread in 4.0.2: > options()[c("download.file.method", "HTTPUserAgent")] $<NA> NULL $HTTPUserAgent [1] "R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > x <- "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download" > data.table::fread(x, header=TRUE)[1:2,] Symbol Name LastSale 1: TXG 10x Genomics, Inc. 89.19 2: YI 111, Inc. 6.53 MarketCap IPOyear Sector 1: $8.77B 2019 Capital Goods 2: $537.81M 2018 Health Care industry 1: Biotechnology: Laboratory Analytical Instruments 2: Medical/Nursing Services Summary Quote V9 1: https://old.nasdaq.com/symbol/txg NA 2: https://old.nasdaq.com/symbol/yi NA Does anyone know what data.table::fread does different to read.csv here (so setting HTTPUserAgent is not needed)? Without HTTPUserAgent, I think data.table::fread just reports something like "libcurl/7.71.1", like read.csv would have done ... Best, Rasmus ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.