Re: [DISCUSS] Write-audit-publish support

2019-11-11 Thread Miao Wang
Thanks! Miao From: Ryan Blue Reply-To: "dev@iceberg.apache.org" , "rb...@netflix.com" Date: Monday, November 11, 2019 at 11:54 AM To: Anton Okolnychyi Cc: Iceberg Dev List , Ashish Mehta Subject: Re: [DISCUSS] Write-audit-publish support I just had a direct request for thi

Re: [DISCUSS] Write-audit-publish support

2019-11-11 Thread Ryan Blue
t;> summary of Snapshot, but clueless the official recommendation on >>>> committing that snapshot. I can think of cherry-picking Appended/Deleted >>>> files, but don't know the nuances of missing something important with this. >>>> >>>> Thanks, >>>>

Re: [DISCUSS] Write-audit-publish support

2019-11-11 Thread Anton Okolnychyi
ion on > committing that snapshot. I can think of cherry-picking Appended/Deleted > files, but don't know the nuances of missing something important with this. > > Thanks, > -Ashish > > -- Forwarded message - > From: Ryan Blue > D

Re: [DISCUSS] Write-audit-publish support

2019-11-08 Thread Ryan Blue
, but clueless the official recommendation on >>> committing that snapshot. I can think of cherry-picking Appended/Deleted >>> files, but don't know the nuances of missing something important with this. >>> >>> Thanks, >>> -Ashish >>> >&g

Re: [DISCUSS] Write-audit-publish support

2019-11-08 Thread Ashish Mehta
ting that snapshot. I can think of cherry-picking Appended/Deleted >> files, but don't know the nuances of missing something important with this. >> >> Thanks, >> -Ashish >> >> >>> ------ Forwarded message ----- >>> From: Ryan Blue

Re: [DISCUSS] Write-audit-publish support

2019-11-08 Thread Ryan Blue
t > know the nuances of missing something important with this. > > Thanks, > -Ashish > > >> -- Forwarded message - >> From: Ryan Blue >> Date: Wed, Jul 31, 2019 at 4:41 PM >> Subject: Re: [DISCUSS] Write-audit-publish support >> To: Ed

[DISCUSS] Write-audit-publish support

2019-11-08 Thread Ashish Mehta
don't know the nuances of missing something important with this. Thanks, -Ashish > -- Forwarded message - > From: Ryan Blue > Date: Wed, Jul 31, 2019 at 4:41 PM > Subject: Re: [DISCUSS] Write-audit-publish support > To: Edgar Rodriguez > Cc: Iceberg Dev

Re: [DISCUSS] Write-audit-publish support

2019-07-31 Thread Ryan Blue
Hi everyone, I've added PR #342 to the Iceberg repository with our WAP changes. Please have a look if you were interested in this. On Mon, Jul 22, 2019 at 11:05 AM Edgar Rodriguez wrote: > I think this use case is pretty helpful in most data

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Edgar Rodriguez
I think this use case is pretty helpful in most data environments, we do the same sort of stage-check-publish pattern to run quality checks. One question is, if say the audit part fails, is there a way to expire the snapshot or what would be the workflow that follows? Best, Edgar On Mon, Jul 22,

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Mouli Mukherjee
This would be super helpful. We have a similar workflow where we do some validation before letting the downstream consume the changes. Best, Mouli On Mon, Jul 22, 2019 at 9:18 AM Filip wrote: > This definitely sounds interesting. Quick question on whether this > presents impact on the current U

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Filip
This definitely sounds interesting. Quick question on whether this presents impact on the current Upserts spec? Or is it maybe that we are looking to associate this support for append-only commits? On Mon, Jul 22, 2019 at 6:51 PM Ryan Blue wrote: > Audits run on the snapshot by setting the snaps

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Ryan Blue
Audits run on the snapshot by setting the snapshot-id read option to read the WAP snapshot, even though it has not (yet) been the current table state. This is documented in the time travel section of the Iceberg site. We added a stageOnly method to Sn

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Anton Okolnychyi
I would also support adding this to Iceberg itself. I think we have a use case where we can leverage this. @Ryan, could you also provide more info on the audit process? Thanks, Anton > On 20 Jul 2019, at 04:01, RD wrote: > > I think this could be useful. When we ingest data from Kafka, we do

Re: [DISCUSS] Write-audit-publish support

2019-07-19 Thread RD
I think this could be useful. When we ingest data from Kafka, we do a predefined set of checks on the data. We can potentially utilize something like this to check for sanity before publishing. How is the auditing process suppose to find the new snapshot , since it is not accessible from the table

[DISCUSS] Write-audit-publish support

2019-07-19 Thread Ryan Blue
Hi everyone, At Netflix, we have a pattern for building ETL jobs where we write data, then audit the result before publishing the data that was written to a final table. We call this WAP for write, audit, publish. We’ve added support in our Iceberg branch. A WAP write creates a new table snapshot