Shekharrajak opened a new issue, #19267: URL: https://github.com/apache/druid/issues/19267
Currently, the Druid Iceberg extension reads ALL columns from Iceberg data files regardless of which columns are needed for ingestion. For tables with hundreds of columns, this causes: - 10-100x unnecessary data read from storage - Increased memory pressure during ingestion - Slower query performance - Higher cloud storage egress costs An e-commerce analytics team has an Iceberg table with 150 columns but only needs 5 columns (timestamp, product_id, category, price, quantity) for their Druid dashboard. Currently, Druid reads all 150 columns, causing: - Query time: - Memory: - Data transfer: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
