adamreeve commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2998703950
I've been experimenting with how this work could be extended to support more ways of configuring encryption beyond having fixed and known AES keys for all files. For example, data encryption keys are often randomly generated per file in multi-file datasets, and the keys are stored encrypted in the Parquet file's encryption metadata. I've got an example of how this could work that integrates with the [parquet-key-management](https://crates.io/crates/parquet-key-management) crate in a [draft PR here](https://github.com/corwinjoy/datafusion/pull/4/files) if anyone is interested. I've added a new `EncryptionFactory` trait for dynamically generating file encryption and decryption properties, and used a registry of these in the runtime environment to allow identifying the encryption factory with a string identifier for compatibility with string based configuration. This should be a follow up PR rather than part of this PR, but I think it's worth mentioning here as this will require adding a separate way to configure encryption rather than using the new `ConfigFileDecryptionProperties` and `ConfigFileEncryptionProperties` types in this PR. In theory, using fixed AES keys could be implemented with an `EncryptionFactory` implementation, but the configuration for this is a bit clunky and opaque, so I think it makes sense to have more direct support for this simple scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org