Re: Single multi-process commit

2021-12-03 Thread Yufei Gu
The WAP feature still requires you to commit each write as a new snapshot, just don't make it as the current one. In that case, each distributed process still needs to commit their change and generate the new version file. It won't help if you want to avoid high concurrent committing. Best, Yufei

Re: Single multi-process commit

2021-12-03 Thread Kyle Bendickson
This could also be achieved using the Write-Audit-Publish feature I believe, where you audit a set of writes and then choose to publish them. Though I'm not as familiar with that feature, but you might look into that as well. Thanks, Kyle Bendickson

RE: Single multi-process commit

2021-12-03 Thread Mayur Srivastava
? Thanks, Mayur From: Jack Ye Sent: Friday, December 3, 2021 4:26 PM To: Iceberg Dev List Subject: Re: Single multi-process commit Hi Mayur, I think what you describe of writing in parallel and committing using a coordinator is the strategy used by most of the engines today. The stream of

Re: Single multi-process commit

2021-12-03 Thread Jack Ye
Hi Mayur, I think what you describe of writing in parallel and committing using a coordinator is the strategy used by most of the engines today. The stream of DataFile (statistics collected from written data files) are passed to the coordinator to do a single commit. In Spark, it's passed as Write