Re: Key rotation in Iceberg data encryption

2021-03-25 Thread Gidon Gershinsky
Sounds good. Giving the users a tool, and the decision to make on whether to rotate a KEK and replace the manifest file, is a flexible way to address this for now. As we gather more information on the safety of unrotated KEKs, and on the consequences of replacing the manifest files, we can either d

Re: Key rotation in Iceberg data encryption

2021-03-25 Thread Ye, Jack
Yes, I totally agree with Russell that key rotation should be treated as something like a rewrite manifest action, and when the rewrite completes, the old files with old keys can be expired in a separated snapshot expiration action. Because of requirements like GDPR, this expiration would happen

Re: Question on ordering on partitions when read

2021-03-25 Thread Ryan Blue
Yeah, I'd use IcebergGenerics to read a table. That's the simplest way. On Thu, Mar 25, 2021 at 11:49 AM Chen Song wrote: > Thanks Ryan. Reading one partition at a time sounds a logical thing to me > in my case. > > I cannot use a query engine for now. In that case, if IcebergGenerics > still th

Re: Question on ordering on partitions when read

2021-03-25 Thread Chen Song
Thanks Ryan. Reading one partition at a time sounds a logical thing to me in my case. I cannot use a query engine for now. In that case, if IcebergGenerics still the best way to read via core API? On Thu, Mar 25, 2021 at 2:16 PM Ryan Blue wrote: > Hi Chen, > > Iceberg doesn't guarantee any orde

Re: Question on ordering on partitions when read

2021-03-25 Thread Ryan Blue
Hi Chen, Iceberg doesn't guarantee any order for records returned by `IcebergGenerics`. If you want a specific order, I'd recommend using a query engine to sort or to read a partition at a time and then sort within that partition. Iceberg can't really guarantee order across files. The sort order

Re: Key rotation in Iceberg data encryption

2021-03-25 Thread Russell Spitzer
I think you can treat the key rotation as a spark action like "RewriteManifestsAction" or something like that which creates a new Snapshot and new set of manifest files. If we want to be secure we would follow this up by immediately exporting and deleting previous snapshots and manifests. One probl

Key rotation in Iceberg data encryption

2021-03-25 Thread Gidon Gershinsky
Hi all, We're working with Jack on a design for encryption of Iceberg data tables, and got a question / decision point we'd like to bring to the community's attention. Might be a bit exotic, but is important, so we have to try this. Any input on this subject, or pointers to relevant contacts / sou

Re: Question on ordering on partitions when read

2021-03-25 Thread Chen Song
Popping up the question. On Wed, Mar 24, 2021 at 2:01 PM Chen Song wrote: > I want to clarify the ordering semantics (if deterministic) on partitions > returned when using iceberg core data API to read. > > Say I define a table with a *time* column and partition by *day(time)*, and > do the foll

Re: Extending Apache Iceberg Encryption Module

2021-03-25 Thread Gidon Gershinsky
I must say I'm impressed with the level of constructiveness and technical quality in this discussion, we're off to a good start in this project. *For POC, I think what you conclude is mostly correct, I am currently implementing the encryption spec, general encrypted file stream with KMS API, and I