A motivational example: Trino has to implement a parallel table metadata
fetching recently (https://github.com/trinodb/trino/pull/23909) because
otherwise metadata queries (e.g., INFORMATION_SCHEMA) was slow. Parallel
metadata retrieval boosted metadata query performance significantly. But
this sol
The proposal looks great to me. Thanks Gavor for working on it. Have we
created a spec change PR yet?
Yufei
On Thu, Dec 19, 2024 at 2:11 AM Gabor Kaszab wrote:
> Hi All,
>
> Just an update that the proposal went through some iterations based on the
> comments from Daniel Weeks. Thanks for taki
That sounds really interesting in a bad way :) :(
This basically means that we need to support every exact Hive versions
which are used by Spark, and we need to exclude our own Hive version from
the Spark runtime.
On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote:
> Hi Peter,
>
>> I think we should
actually, there is a way for the catalog to return S3 objects without
granting access to the entire bucket: aws presigning:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
This offers time-bounded access to an object
catalog will need to generate and return the pres
Hi Vladimir and JB,
There have been some previous discussions on security [1].
> We can think about splitting table data into multiple files for
> column-level security and masking. For example, instead of storing columns
> [a, b, c] in the same Parquet file, we split them into three files: [a,