ok after adding more instrumentation I see that Reader::estimateStatistics
may be a culprit.
looks like estimated stats may be performing full table estimate and thats
why it is so slow. does any one know if it is possible to
avoid Reader::estimateStatistics?
Also does estimateStatistics use appr
Thanks @Jingsong for reply
Yes one additional data point about the table.
This table is avro table and generated from stream ingestion. We expect a
couple of thousand snapshots created daily.
We are using appendsBetween API , I am I think any compaction operation
will break the API. but I will ta
Hi Sud,
The batch read of the Iceberg table should just read the latest snapshot.
I think this case is that your large tables have a large number of manifest
files.
1.The simple way is reducing manifest file numbers:
- For reducing manifest file number, you can try
`Actions.rewriteManifests`(Than
HI Iceberg-devs
We are trying to root cause issue where driver get stuck when trying to
read comparatively large tables ( > 2000 snapshots)
When I tried to look at the thread dump of the driver's main thread I see
that thread is stuck in planning tasks. I also noticed that iceberg-worker-pool
is