adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2852529965
> Here is how spark does encryption configuration My understanding of how this works in Spark from reading this and looking at some of the code: * Spark requires specifying a class used to generate file encryption and/or decryption properties. This is configured with the `spark.hadoop.parquet.crypto.factory.class` config setting, and the class needs to implement `EncryptionPropertiesFactory` and/or `DecryptionPropertiesFactory` to generate file encryption or decryption properties as required. The class gets access to extra context like the file schema so it knows what columns to provide keys for (see the [getFileEncryptionProperties](https://github.com/apache/parquet-java/blob/142bff02b09c468783f11f452b3dec9174c56a2a/parquet-hadoop/src/main/java/org/apache/parquet/crypto/EncryptionPropertiesFactory.java#L94-L110) method). * Spark supports using the KMS based API by providing a built-in `PropertiesDrivenCryptoFactory` class that implements `EncryptionPropertiesFactory` and `DecryptionPropertiesFactory`. This requires also specifying a `KmsClient` implementation with the `spark.hadoop.parquet.encryption.kms.client.class` key, and this class must be defined by users (only a mock `InMemoryKMS` class is provided for testing). * In theory a user could also define their own class that implements `EncryptionPropertiesFactory` and `DecryptionPropertiesFactory` if they don't want to use the KMS based API, for example if they want to define AES keys directly. Starting with similarly flexible `EncryptionPropertiesFactory` and `DecryptionPropertiesFactory` traits in Datafusion seems like a reasonable approach to me. I'm not that familiar with Java, but from what I understand it's straightforward to define your own `KmsClient` in a JAR and then include that at runtime so it's discoverable by the configuration mechanism. This approach doesn't really translate to Rust though. If any custom code is needed it will need to be compiled in unless we use something like WebAssembly or an FFI, but that seems overly complicated and unnecessary. We could maintain some level of string-configurability by letting users statically register named implementations of traits in code and then reference these in configuration strings. Corwin mentioned the `typetag` crate that can automate this, or it could be more manual. > I personally suggest using the `Arc<dyn Any>` approach I don't really understand the reason for using `Any` rather than a trait like `Arc<dyn EncryptionPropertiesFactory>`. At some point an `Any` would need to be downcast to something that Datafusion understands for it to be usable right? But I agree we should come up with an example of how we'd like this to work and that should provide more clarity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org