Re: [DISCUSS] REST Catalog bulk object lookup

2025-01-03 Thread Vladimir Ozerov
A motivational example: Trino has to implement a parallel table metadata fetching recently (https://github.com/trinodb/trino/pull/23909) because otherwise metadata queries (e.g., INFORMATION_SCHEMA) was slow. Parallel metadata retrieval boosted metadata query performance significantly. But this sol

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2025-01-03 Thread Yufei Gu
The proposal looks great to me. Thanks Gavor for working on it. Have we created a spec change PR yet? Yufei On Thu, Dec 19, 2024 at 2:11 AM Gabor Kaszab wrote: > Hi All, > > Just an update that the proposal went through some iterations based on the > comments from Daniel Weeks. Thanks for taki

Re: [DISCUSS] Hive Support

2025-01-03 Thread Péter Váry
That sounds really interesting in a bad way :) :( This basically means that we need to support every exact Hive versions which are used by Spark, and we need to exclude our own Hive version from the Spark runtime. On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote: > Hi Peter, > >> I think we should

Re: There is no easy way to secure Iceberg data. How can we improve?

2025-01-03 Thread Steve Loughran
actually, there is a way for the catalog to return S3 objects without granting access to the entire bucket: aws presigning: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html This offers time-bounded access to an object catalog will need to generate and return the pres

Re: There is no easy way to secure Iceberg data. How can we improve?

2025-01-03 Thread Micah Kornfield
Hi Vladimir and JB, There have been some previous discussions on security [1]. > We can think about splitting table data into multiple files for > column-level security and masking. For example, instead of storing columns > [a, b, c] in the same Parquet file, we split them into three files: [a,