alamb opened a new issue, #16304:
URL: https://github.com/apache/datafusion/issues/16304

   ### Is your feature request related to a problem or challenge?
   
   - Part of https://github.com/apache/datafusion/issues/13456
   
   There are more and more blogs like 
[this](https://altinity.com/blog/the-future-has-arrived-parquet-on-iceberg-finally-outperforms-mergetree)
 that show examples of running queries against data on remote object store
   
   I would like to compare the performance of DataFusion to these other 
systems, but I find it really hard to run the examples
   
   For example, in the above blog post, 
   
   ```sql
   INSERT INTO tripdata
   SELECT * FROM 
s3('s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/*.parquet',
 NOSIGN)
   SETTINGS max_threads=32, max_insert_threads=32, 
input_format_parallel_parsing=0;
   ```
   
   When I try to follow the example in 
https://datafusion.apache.org/user-guide/cli/datasources.html#remote-files-directories
 to look at this same data in `datafusion-cli` it doesn't work (and it gives me 
a confusing message)
   
   ```shell
   $ datafusion-cli
   DataFusion CLI v47.0.0
   > select count(*) from 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
   Error during planning: table 
'datafusion.public.s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'
 not found
   >
   ```
   
   I also tried setting `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` as 
suggested and it still fails:
   
   ```shell
   $AWS_ACCESS_KEY_ID=foo AWS_SECRET_ACCESS_KEY=bar datafusion-cli
   DataFusion CLI v47.0.0
   > select count(*) from 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
   Error during planning: table 
'datafusion.public.s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'
 not found
   ```
   
   `CREATE EXTERNAL TABLE` does appear to work
   ```sql
   > CREATE EXTERNAL TABLE hits
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
   Object Store error: The operation lacked the necessary privileges to 
complete for path nyc_taxi_rides/data/tripdata_parquet: Error performing HEAD 
https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet
 in 136.439542ms - Server returned non-2xx status code: 403 Forbidden:
   ```
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to