[I] [Druid Iceberg Extension] Implement incremental/snapshot-based ingestion to support real-time use cases (druid)

via GitHub Mon, 06 Apr 2026 04:19:44 -0700


Shekharrajak opened a new issue, #19268:
URL: https://github.com/apache/druid/issues/19268


   ### Description
   Currently, Druid performs full table scans on every ingestion run. For large 
Iceberg tables (billions of rows), this makes real-time ingestion impractical:
   - Re-scanning 50TB table every minute is impossible
   - Compute costs are prohibitive
   - SLA requirements (sub-minute latency) cannot be met
   
   ### Motivation
   
   A financial trading platform needs to ingest new stock trades within 30 
seconds of arrival. Their Iceberg table has 50 billion historical rows.
   Current Behavior:
   ingestion:
     type: iceberg
     table: stock_trades
     schedule: "@every 1m"
     
   # Result: 
   # - Every 1 minute: Full table scan of 50 billion rows
   # - Takes 45 minutes (FAILS SLA)
   # - Cost: High 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Druid Iceberg Extension] Implement incremental/snapshot-based ingestion to support real-time use cases (druid)

Reply via email to