Shekharrajak opened a new issue, #19267:
URL: https://github.com/apache/druid/issues/19267

   Currently, the Druid Iceberg extension reads ALL columns from Iceberg data 
files regardless of which columns are needed for ingestion. For tables with 
hundreds of columns, this causes:
   - 10-100x unnecessary data read from storage
   - Increased memory pressure during ingestion
   - Slower query performance
   - Higher cloud storage egress costs
   
   An e-commerce analytics team has an Iceberg table with 150 columns but only 
needs 5 columns (timestamp, product_id, category, price, quantity) for their 
Druid dashboard. Currently, Druid reads all 150 columns, causing:
   - Query time:
   - Memory:
   - Data transfer: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to