Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-10 Thread Daniel Weeks
Hey Mayur and Laurent, As an alternative to using S3FileIO to talk to GCS, I just posted a native GCSFileIO implementation and would really appreciate feedback. I'd prefer to go this route which has a number of advantages (like using gRPC eventually)

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-10 Thread Ryan Blue
I think there's some confusion here. The changes doesn't make S3FileIO the handler for gs URIs. All it does is allow gs URIs when you've configured S3FileIO for your catalog. That's why #3656 is "remove S3 URI scheme restrictions". I think we do want to have a native GCSFileIO implementation. And

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-10 Thread Jack Ye
Yes, the intention is to allow S3FileIO to be used to temporarily unblock users who are using a S3-compatible storage service or framework and can directly use it to make requests through the AWS S3 SDK. We have seen this repeatedly occur for MinIO, Dell EMC ECS, GCS. But I think we should always p

Re: Supporting TINYINT and SMALLINT in Iceberg

2021-12-10 Thread Walaa Eldin Moustafa
Just to update this thread, we have agreed internally to use INT in the struct schema corresponding to union types. The reasons are two-fold: (1) Uncertainty around whether TINYINT will make it to Iceberg while we wanted to stick to the spec. (2) Since Avro does not support TINYINT either, this iss

Re: Supporting TINYINT and SMALLINT in Iceberg

2021-12-10 Thread Ryan Blue
For TINYINT and SMALLINT, I don't think there is any advantage at the storage layer. Avro uses variable-length ints and the columnar formats, Parquet and ORC, will do efficient encodings for multiple values in a column. I don't see much value in these types, besides compatibility with existing SQL.