Carl Boettiger created ARROW-15060:
--------------------------------------

             Summary: open_dataset() on csv files lacks support for compressed 
files
                 Key: ARROW-15060
                 URL: https://issues.apache.org/jira/browse/ARROW-15060
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
            Reporter: Carl Boettiger


Using open_dataset() on S3 buckets of csv files is a game-changing magic, 
particularly with all the additional support for database / dplyr operations 
over the remote connection, and the widespread adoption of S3 buckets even by 
old-school big data providers like NOAA.

 

It's not uncommon to encounter buckets with *.csv.gz formats.  I know 
technically this should be unnecessary, as compression can be done "in flight" 
by the server, but usually this is not an issue for R users since R's 
`connection` class automatically detects and gunzips compressed files (over 
either POSIX or HTTP connections).  It would be really great if arrow could 
handle this case too. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to