alamb opened a new issue, #16302:
URL: https://github.com/apache/datafusion/issues/16302

   ### Is your feature request related to a problem or challenge?
   
   - part of https://github.com/apache/datafusion/issues/13456
   - related to https://github.com/apache/datafusion/issues/16299
   
   I would like to make  querying files from remote stores to be easy and a 
great experience in DataFusion, and `datafusion-cli` in particular. 
   
   While testing https://github.com/apache/datafusion/pull/16300, I tried this 
command:
   
   ```shell
   datafusion-cli
   ```
   
   ```sql
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
   Object Store error: Object at location nyc_taxi_rides/data/tripdata_parquet 
not found: Error performing HEAD 
https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet
 in 142.679833ms - Server returned non-2xx status code: 404 Not Found:
   ```
   
   This confused me for quite a while as that is a valid url (prefix)
   
   The issue is that the url 
`'s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'` does not 
end in a `/`. If you add a `/` it then works great:
   
   ```
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   0 row(s) fetched.
   Elapsed 1.624 seconds.
   ```
   
   BTW this is consistent with a local file system where selecting from a 
directory that doesn't end in a path works just fine:
   
   ```sql
   -- Write data to `foo` directory:
   > copy (values(1)) to 'foo/1.parquet';
   +-------+
   | count |
   +-------+
   | 1     |
   +-------+
   1 row(s) fetched.
   Elapsed 0.044 seconds.
   
   -- Note the location doesn't end in `/` but it works fine
   > create external table foo stored as parquet location 'foo';
   0 row(s) fetched.
   Elapsed 0.022 seconds.
   
   > select * from foo;
   +---------+
   | column1 |
   +---------+
   | 1       |
   +---------+
   1 row(s) fetched.
   Elapsed 0.132 seconds.
   ```
   
   ### Describe the solution you'd like
   
   I would like this to be less confusing
   
   
   
   
   ### Describe alternatives you've considered
   
   # Alternate 1: Better Error Message
   At the very least we can make the message more explicit ("Not found. Hint: 
if it is a directory the path should end with `/`")
   
   # Alternate 2: Preferred
   It would be even better to automatically add a`/` to the path if the first 
one was not found and try again
   
   I think the trick will be to figure out at what level we should try to add 
`/` (probably when first creating the ListingTable?)
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to