Thanks Ryan and Jingsong , I will add one more TODO to see if we can use
spark to parallelize estimation even when predicate pushdown is done. (
spark does this for file system based tables) and possibly for manifest
readers.
I will try to submit PR upstream for adding options and will create iss
Thanks Sud for in-depth debugging. And thanks Ryan for the explanation.
+1 to have a table property to disable stats estimation.
IIUC, the difference between stats estimation and scan with filters is
mainly in the partition filters:
Iceberg uses filter-push-down to complete partition pruning. So
Hey, great question. I just caught up on the other thread, but let me
provide some context here.
Spark uses the stats estimation here to determine whether or not to
broadcast. If we returned a default value, then Spark wouldn't be able to
use Iceberg tables in broadcast joins. Even though Spark wo
As per java doc estimateStatistics does not take into account any
operators, any reason why iceberg reader implements this? I wonder if it
would help to make it configurable and return default value.
/**
* A mix in interface for {@link DataSourceReader}. Data source
readers can implement this
*