Carl Boettiger created ARROW-15060: -------------------------------------- Summary: open_dataset() on csv files lacks support for compressed files Key: ARROW-15060 URL: https://issues.apache.org/jira/browse/ARROW-15060 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Carl Boettiger
Using open_dataset() on S3 buckets of csv files is a game-changing magic, particularly with all the additional support for database / dplyr operations over the remote connection, and the widespread adoption of S3 buckets even by old-school big data providers like NOAA. It's not uncommon to encounter buckets with *.csv.gz formats. I know technically this should be unnecessary, as compression can be done "in flight" by the server, but usually this is not an issue for R users since R's `connection` class automatically detects and gunzips compressed files (over either POSIX or HTTP connections). It would be really great if arrow could handle this case too. -- This message was sent by Atlassian Jira (v8.20.1#820001)