Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

via GitHub Fri, 11 Jul 2025 13:30:16 -0700


corwinjoy commented on code in PR #16738:
URL: https://github.com/apache/datafusion/pull/16738#discussion_r2201788873



##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1571,12 +1564,14 @@ fn spawn_parquet_parallel_serialization_task(
         let max_row_group_rows = writer_props.max_row_group_size();
         let (mut column_writer_handles, mut col_array_channels) =
             spawn_column_parallel_row_group_writer(
-                Arc::clone(&schema),
-                Arc::clone(&writer_props),
+                Arc::clone(&arrow_row_group_writer_factory),
                 max_buffer_rb,
                 &pool,
             )?;
         let mut current_rg_rows = 0;
+        // TODO: row_group_writer should use the correct row group index. 
Currently this would fail if
+        // multiple row groups were written.
+        // let mut rg_index = 0;

Review Comment:
   @adamreeve Definitely something to note. We will want to resolve this before 
the final PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

Reply via email to