Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-15 Thread Nirav Patel
Hi, > Can you describe a bit more on your ingestion rate ? > what exactly were the read limits? Streaming job ingestion is maximum 1M records per batch. Trigger interval is every 1 minute which seem to be fine for regular stream processing. Our avg per minute record count is way less than that. >

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-14 Thread Prashant Singh
Hi Nirav, > in our case streaming job was stuck for over 3 days 3 days seems too much, what exactly were the read limits and how many files before and after compaction ? Can you describe a bit more on your ingestion rate ? > Also why not add `nextValidSnapshot(Snapshot curSnapshot)` check at the

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-10 Thread Nirav Patel
Hi Prashant, Thanks for responding and sharing the related issue. Issue that's been fix seem very much related. However, in our case streaming job was stuck for over 3 days. You are sayiing because of it scanning all the Manifests list `latestOffSet` may not be returning for that long! Also why

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-10 Thread Prashant Singh
Hi Nirav, Thanks for reporting the issue, let me try answering your question below :) > We are encountering the following issue where spark streaming read job from iceberg table stays stuck after some maintenance jobs (rewrite_data_files and rewrite_manifests) has been ran on parallel on same tab