On 2020-07-17 07:54 -0400, Sam H wrote:
| On 2020-07-17 09:30 +0100, ruipbarradas wrote:
| | On 2020-07-16 20:59 -0500, luke-tier...@uiowa.edu wrote:
| | | Às 08:45 de 15/07/20, Sam H escreveu:
| | | | Hi,
| | | | 
| | | | I am trying to download some 
| | | | data using read.csv and it works 
| | | | perfectly in RStudio and fails 
| | | | in the R console in the terminal 
| | | | in Ubuntu 18.04 after upgrading 
| | | | from R 3.6.3 to 4.0.2. 
| | | 
| | | On my Ubuntu system the download 
| | | with read.csv succeeds in an R 
| | | console if I set the HTTPUserAgent 
| | | and download.file.method options to 
| | | match the ones used by RStudio.
| | | 
| | | Given how picky the server is being 
| | | I would worry about whether this use 
| | | is in line with the site's terms of 
| | | service.
| |
| | Yes, I thought it's a site policy 
| | issue too. But the file can be 
| | accessed and read/downloaded from 
| | RStudio and Firefox so apparently 
| | there's no reason why R console 
| | shouldn't .
| 
| Hello,
| 
| Thank you very much to you all to look into this.
| 
| I came across this problem when I was using TTR::stockSymbols() (
| 
https://github.com/joshuaulrich/TTR/blob/e6609b9f7621f3a4b1a204c159af61aebc89997e/R/WebData.R)
| .
| 
| As a workaround I added this function 
| to my private R package and instead of 
| read.csv I am now using 
| data.table::fread() which properly 
| (without failing) downloads the file 
| and reads it.

Dear Sam,

Good thing you solved this.  

Like Luke said, to use read.csv you need 
to set the HTTPUserAgent option:

        options("HTTPUserAgent"="User-Agent: RStudio Desktop (1.3.959)")

... or with cURL directly:

        rasmus@twentyfive ~ % curl -H 'User-Agent: RStudio Desktop (1.3.959)' 
'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'

Às 08:45 de 15/07/20, Sam H escreveu:
| Before upgrading this worked in the R 
| console in the terminal also without 
| any issues.

In version 3.6.3, I was not able to 
run the lines

        > R.Version()$version.string
        [1] "R version 3.6.3 (2020-02-29)"
        > options()[c("download.file.method", "HTTPUserAgent")]
        $<NA>
        NULL
        
        $HTTPUserAgent
        [1] "R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"
        
        > 
x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download";
        > read.csv(x, as.is=TRUE, na="n/a")
        Error in file(file, "rt") :
          cannot open the connection to 
'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'
        In addition: Warning message:
        In file(file, "rt") :
          cannot open URL 
'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download':
 HTTP status was '403 Forbidden'
        >

Running data.table::fread in 4.0.2:

        > options()[c("download.file.method", "HTTPUserAgent")]
        $<NA>
        NULL
        
        $HTTPUserAgent
        [1] "R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)"
        > x <- 
"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download";
        > data.table::fread(x, header=TRUE)[1:2,]
           Symbol               Name LastSale
        1:    TXG 10x Genomics, Inc.    89.19
        2:     YI          111, Inc.     6.53
           MarketCap IPOyear        Sector
        1:    $8.77B    2019 Capital Goods
        2:  $537.81M    2018   Health Care
                                                   industry
        1: Biotechnology: Laboratory Analytical Instruments
        2:                         Medical/Nursing Services
                               Summary Quote V9
        1: https://old.nasdaq.com/symbol/txg NA
        2:  https://old.nasdaq.com/symbol/yi NA

Does anyone know what data.table::fread 
does different to read.csv here (so 
setting HTTPUserAgent is not needed)?  

Without HTTPUserAgent, I think 
data.table::fread just reports something 
like "libcurl/7.71.1", like read.csv 
would have done ...

Best,
Rasmus

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to