jordepic opened a new pull request, #4658: URL: https://github.com/apache/datafusion-comet/pull/4658
## Which issue does this PR close? Closes #4322. ## Rationale for this change Iceberg spark writes are V2 operators and contain the functionality for writing data files, metadata files, and committing to the catalog. Ultimately, Comet is only well-positioned to just accelerate data file writing (assuming they're parquet files). It is also crucial to ensure that the actual data file writing piece of the spark plan for iceberg writing is included within the AQE block of a spark plan, thereby ensuring that we re-plan writes in response to runtime decisions regarding its upstream operators. Our split is fairly simple - we write the data files like normal in the "writer" operator, serialize its output, and pass it back to the "committer" operator. In the future, we'll target just the "writer" operator for speedup with iceberg-rust. ## What changes are included in this PR? This PR contains 5 commits. 1) Docs outlining the WHOLE iceberg-write acceleration feature, not just these changes (I'm happy to modify/remove as needed). 2) Planning rules to move iceberg append and overwrite operations to our "split operator" design. 3) Planning rules to move iceberg delete, update, and merge operations to our "split operator" design. 4) Tests for part 2 5) Tests for part 3 ## How are these changes tested? We have unit tests for each operator that we're replacing that ensures that the plan shape is correct, we commit to our iceberg table the proper number of times, and our iceberg table end state is correct when we scan it after a write operation. I've been running with these changes locally now and they're all performing as expected as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
