Re: Spark + Iceberg ; How to ensure idempotent updates and deduplication

2023-07-20 Thread Ryan Blue
Here's another resource for write-audit-publish, an example notebook from Robin Moffatt: https://github.com/tabular-io/docker-spark-iceberg/blob/main/spark/notebooks/Iceberg%20-%20Write-Audit-Publish%20(WAP)%20with%20Branches.ipynb I don't think that's going to quite cover what Nirav was asking th

Re: Cherrypick "delete" snapshot

2023-07-20 Thread Ryan Blue
Pucheng, For cherry-pick, we've only implemented the operations that we know can be safely cherry-picked without knowing more context about the operation. Right now, those are cases where the operation is actually a fast-forward (not actually a cherry-pick), and append that only adds new data, or

Cherrypick "delete" snapshot

2023-07-20 Thread Pucheng Yang
Hi community, I have a table that has the history below: null -> s1: overwrite (partition1) -> s2: overwrite (partition2) -> s3(current): delete (partition1). I want to undo the commit that generates s3 because it is a bad commit, and my goal is to have a history like below: null -> s1: overwri

Re: Spark + Iceberg ; How to ensure idempotent updates and deduplication

2023-07-20 Thread Christina Larsen
Iceberg supports the write-audit-publish workflow that does essentially what you want at a batch level, either with individual snapshots or branches, but the former at least is not well documented by Iceberg at present. I don't have bandwidth to do a more detailed write-up on it at present but some

Spark + Iceberg ; How to ensure idempotent updates and deduplication

2023-07-20 Thread Nirav Patel
Hi, I'm using spark structured streaming to append to iceberg partitioned table. I am using custom iceberg catalog (gCP biglake iceberg catalog) to upsert data into iceberg tables that are backed by gcp biglake metastore. There are multiple ways to append streaming data into partition table. One

回复: Discussion about applying spotless for scala code

2023-07-20 Thread Liu Xianyang
Thanks Ryan and community for the feedback. -Xianyang 发件人: Ryan Blue 发送时间: 2023年7月19日 17:44 收件人: dev@iceberg.apache.org 主题: Re: Discussion about applying spotless for scala code We talked about this more in the community sync today and came to the conclusion th

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-20 Thread Ashok Krishna
Hello, I'm interested in helping in whatever way I can.Count me in. Thanks, Ashok On Thu, Jul 20, 2023 at 2:01 PM Eduard Tudenhoefner wrote: > Having an Iceberg conference sounds great. I'd love to help out here as > well, so count me in! > > Eduard > > On Thu, Jul 20, 2023 at 12:18 AM Nan Zhu

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-20 Thread Eduard Tudenhoefner
Having an Iceberg conference sounds great. I'd love to help out here as well, so count me in! Eduard On Thu, Jul 20, 2023 at 12:18 AM Nan Zhu wrote: > Hello! > > Glad to help here and also present our use case on iceberg > > Thanks! > > Nan > > On Wed, Jul 19, 2023 at 3:00 PM Jay Dave wrote: >