Re: Equality deletes with Flink - design question

2024-04-10 Thread Péter Váry
Hi Gabor, I don't know about the historical reasons, so I was hoping someone with familiarity of the past would chime in. Since there were no response, here are my thoughts: When writing CDC, we need every record with the same primary key on the same writer (this is a must). Also there are good

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-10 Thread Nirav Patel
Hi Prashant, Thanks for responding and sharing the related issue. Issue that's been fix seem very much related. However, in our case streaming job was stuck for over 3 days. You are sayiing because of it scanning all the Manifests list `latestOffSet` may not be returning for that long! Also why

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-10 Thread Prashant Singh
Hi Nirav, Thanks for reporting the issue, let me try answering your question below :) > We are encountering the following issue where spark streaming read job from iceberg table stays stuck after some maintenance jobs (rewrite_data_files and rewrite_manifests) has been ran on parallel on same tab

Fwd: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-10 Thread Nirav Patel
We are encountering the following issue where spark streaming read job from iceberg table stays stuck after some maintenance jobs (rewrite_data_files and rewrite_manifests) has been ran on parallel on same table. https://github.com/apache/iceberg/issues/10117 I'm trying to understand what creates

Re: Equality deletes with Flink - design question

2024-04-10 Thread Gabor Kaszab
Hey Iceberg People, Just pinging this thread here. Any Flink expertise on the above questions is appreciated! :) Gabor On Mon, Mar 25, 2024 at 3:43 PM Gabor Kaszab wrote: > Hey Iceberg Community, > > I've recently had the chance to examine Iceberg's equality delete support > in a multi-engine

Re: Looking for help with Pyflink and Iceberg

2024-04-10 Thread Fokko Driesprong
Hey Frank, Thanks for reaching out here. I spent some cycles a while ago to remove the Hadoop requirement from Flink. There were a lot of APIs that needed to change, which caused not to follow through with it. But this might help you in getting PyFlink up and running since it contains an example s