Re: [PR] feat: Parquet modular encryption [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3025176564 Thank you @corwinjoy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-30 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3021088650 Thanks @alamb much appreciated for the review and helpful feedback! We hope to have a followup PR soon with a config to make encryption optional. -- This is an automated message

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3015285765 Thanks again @corwinjoy / @adamreeve and everyone else. This is great -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-28 Thread via GitHub
alamb merged PR #16351: URL: https://github.com/apache/datafusion/pull/16351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168131023 ## datafusion-cli/tests/sql/encrypted_parquet.sql: ## @@ -0,0 +1,75 @@ +/* +Test parquet encryption and decryption in DataFusion SQL. +See datafusion/com

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168097037 ## datafusion/common/src/config.rs: ## @@ -2017,6 +2056,305 @@ config_namespace_with_hashmap! { } } +#[derive(Clone, Debug, Default, PartialEq)] +pub str

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168085502 ## docs/source/user-guide/configs.md: ## @@ -81,6 +81,8 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execut

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168078769 ## datafusion/proto-common/src/from_proto/mod.rs: ## @@ -1066,6 +1066,7 @@ impl TryFrom<&protobuf::TableParquetOptions> for TableParquetOptions {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167900882 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -930,12 +959,14 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &Obje

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167883986 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -930,12 +959,14 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &Obje

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167542855 ## docs/source/user-guide/configs.md: ## @@ -81,6 +81,8 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execution.

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167546893 ## datafusion-cli/tests/sql/encrypted_parquet.sql: ## @@ -0,0 +1,75 @@ +/* +Test parquet encryption and decryption in DataFusion SQL. +See datafusion/common/

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3003280082 > Thank you @corwinjoy and @adamreeve -- this PR was a joy to read and review. The code is clear, well commented, and well tested ❤️ 🏆 > > I think we should follow up with:

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165756822 ## datafusion-cli/tests/sql/encrypted_parquet.sql: ## @@ -0,0 +1,75 @@ +/* +Test parquet encryption and decryption in DataFusion SQL. +See datafusion/com

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165246656 ## docs/source/user-guide/configs.md: ## @@ -81,6 +81,8 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execut

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165219803 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_with_

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165202964 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_w

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2164809665 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_with_

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
mbutrovich commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3000137303 > I am sorry I haven't had a chance to review this yet. It would be great if @mbutrovich could also take a look. I have this on my list to review but I haven't been able to find t

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-23 Thread via GitHub
adamreeve commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2998703950 I've been experimenting with how this work could be extended to support more ways of configuring encryption beyond having fixed and known AES keys for all files. For example, data

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2151077910 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2151059658 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2978284818 I am sorry I haven't had a chance to review this yet. It would be great if @mbutrovich could also take a look. I have this on my list to review but I haven't been able to find the tim

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
mbutrovich commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2977839997 Thank you and @adamreeve for driving so much of the modular encryption work! I'll take a look at this branch this week and see how this might get Comet supporting modular encrypti

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-12 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2968354595 @alamb One piece I would like to solicit feedback on is if there is a way to leverage the existing tests to more thoroughly vet encryption. What I mean by that, is that we uncovere

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-10 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2139135378 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /// d

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136730391 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136710028 ## benchmarks/src/bin/dfbench.rs: ## @@ -60,11 +60,11 @@ pub async fn main() -> Result<()> { Options::Cancellation(opt) => opt.run().await, Opt

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136714217 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2957368512 @adamreeve @rok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136732397 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136733767 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136718671 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136721651 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136718671 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136715685 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

[PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy opened a new pull request, #16351: URL: https://github.com/apache/datafusion/pull/16351 ## Which issue does this PR close? - Closes #15216. ## What changes are included in this PR? This PR adds support for encryption in DataFusion’s Parquet implementation.