alamb opened a new pull request, #16300:
URL: https://github.com/apache/datafusion/pull/16300

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Closes https://github.com/apache/datafusion/issues/16299
   - Related to https://github.com/apache/datafusion/issues/13456
   
   ## Rationale for this change
   
   I want to be able to access public s3 buckets without providing (valid) s3 
credentials
   
   ## What changes are included in this PR?
   
   1. Add `skip_signature` option to `datafusion-cli` `CREATE EXTERNAL TABLE`
   2. Default to `skip_signature` when other credentials are not provided
   3. Update documentation
   
   Before this PR:
   ```sql
   DataFusion CLI v47.0.0
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   Object Store error: Generic S3 error: the credential provider was not enabled
   ```
   
   After this PR:
   
   ```sql
   DataFusion CLI v48.0.0
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   selec0 row(s) fetched.
   Elapsed 1.575 seconds.
   
   > select count(*) from nyc_taxi_rides;
   +------------+
   | count(*)   |
   +------------+
   | 1310903963 |
   +------------+
   1 row(s) fetched.
   Elapsed 3.011 seconds.
   ```
   
   ## Are these changes tested?
   
   Yes, new unit tests are added and I tested it manually
   
   For example, if you provide credentials, they take precidence over the 
signature:
   ```shell
   AWS_ACCESS_KEY_ID=foo AWS_SECRET_ACCESS_KEY=bar  cargo run -p datafusion-cli
   ```
   
   ```sql
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   Object Store error: Generic S3 error: Error performing list request: Error 
performing GET 
https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data?list-type=2&prefix=nyc_taxi_rides%2Fdata%2Ftripdata_parquet%2F
 in 134.200375ms - Server returned non-2xx status code: 403 Forbidden: <?xml 
version="1.0" encoding="UTF-8"?>
   <Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you 
provided does not exist in our 
records.</Message><AWSAccessKeyId>foo</AWSAccessKeyId><RequestId>ZAEM63Q02FQXYMTA</RequestId><HostId>mYh2PUtKzDxjrPA4vQm4d+Qae9TiNpCUDDTS5BP4jTayKVE4BQbSpT/+HSIAdzt3lne6G0sxNmE=</HostId></Error>
   ```
   
   But you can override this with `SKIP_SIGNATURE`
   ```sql
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' 
OPTIONS(AWS.SKIP_SIGNATURE 'true');
   0 row(s) fetched.
   Elapsed 1.455 seconds.
   ```
   
   ## Are there any user-facing changes?
   Easier to use `datafusion-cli`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to