Shekharrajak opened a new issue, #19268:
URL: https://github.com/apache/druid/issues/19268
### Description
Currently, Druid performs full table scans on every ingestion run. For large
Iceberg tables (billions of rows), this makes real-time ingestion impractical:
- Re-scanning 50TB table every minute is impossible
- Compute costs are prohibitive
- SLA requirements (sub-minute latency) cannot be met
### Motivation
A financial trading platform needs to ingest new stock trades within 30
seconds of arrival. Their Iceberg table has 50 billion historical rows.
Current Behavior:
ingestion:
type: iceberg
table: stock_trades
schedule: "@every 1m"
# Result:
# - Every 1 minute: Full table scan of 50 billion rows
# - Takes 45 minutes (FAILS SLA)
# - Cost: High
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]