alamb opened a new issue, #16304: URL: https://github.com/apache/datafusion/issues/16304
### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/13456 There are more and more blogs like [this](https://altinity.com/blog/the-future-has-arrived-parquet-on-iceberg-finally-outperforms-mergetree) that show examples of running queries against data on remote object store I would like to compare the performance of DataFusion to these other systems, but I find it really hard to run the examples For example, in the above blog post, ```sql INSERT INTO tripdata SELECT * FROM s3('s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/*.parquet', NOSIGN) SETTINGS max_threads=32, max_insert_threads=32, input_format_parallel_parsing=0; ``` When I try to follow the example in https://datafusion.apache.org/user-guide/cli/datasources.html#remote-files-directories to look at this same data in `datafusion-cli` it doesn't work (and it gives me a confusing message) ```shell $ datafusion-cli DataFusion CLI v47.0.0 > select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; Error during planning: table 'datafusion.public.s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet' not found > ``` I also tried setting `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` as suggested and it still fails: ```shell $AWS_ACCESS_KEY_ID=foo AWS_SECRET_ACCESS_KEY=bar datafusion-cli DataFusion CLI v47.0.0 > select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; Error during planning: table 'datafusion.public.s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet' not found ``` `CREATE EXTERNAL TABLE` does appear to work ```sql > CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; Object Store error: The operation lacked the necessary privileges to complete for path nyc_taxi_rides/data/tripdata_parquet: Error performing HEAD https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet in 136.439542ms - Server returned non-2xx status code: 403 Forbidden: ``` ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org