adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2739092901
I had a go at seeing if I could use this callback based configuration approach to integrate with encryption without Datafusion needing to know anything about Parquet encryption. I tested with Rok's [current encryption branch](https://github.com/rok/arrow-rs/tree/a522ab105fd7f2b8d4ef227b2b75489a93a137af), and tried both reading and writing encrypted files. I wrote an example that nearly works here: https://github.com/adamreeve/datafusion/blob/encryption_poc/datafusion-examples/examples/parquet_encryption.rs One obvious downside is that this isn't compatible with conversion of plans to protobuf. I've just ignored the new config fields there, although ideally we would at least raise an error if they're set and we try to convert a plan to protobuf, but that might require adding a "parquet" feature to the protobuf crates. I think this is fine though as long as you don't want to use this in a distributed query engine. I could get writing of encrypted Parquet working, but reading fails when trying to infer the schema, as this uses a `ParquetMetaDataReader` ([here](https://github.com/apache/datafusion/blob/19dd46d3c65e3160c0fb949880bc3e960434e96d/datafusion/datasource-parquet/src/file_format.rs#L765)) which doesn't know about the `ArrowReaderOptions` but instead uses `FileDecryptionProperties` directly ([here](https://github.com/apache/arrow-rs/blob/660a3ac22a8ef8601acf4548d65146bc623f653a/parquet/src/file/metadata/reader.rs#L80)). I'm not sure how best to work around that. Maybe arrow-rs could be refactored to support reader options that aren't Arrow specific, that could be passed to the `ParquetMetaDataReader`? Or maybe Datafusion could change how metadata is read and use `ParquetObjectReader::get_metadata_with_options` instead? I made a brief attempt at that but didn't get very far. There is a comment on `fetch_parquet_metadata` though that says "This component is a subject to **change** in near future"... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org