findepi commented on code in PR #13454:
URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845463997
##########
datafusion/catalog/src/table.rs:
##########
@@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send {
}
/// Get statistics for this table, if available
+ /// Although not presently used in mainline DataFusion, this allows
implementation specific
+ /// behavior for downstream repositories, in conjunction with specialized
optimizer rules to
+ /// perform operations such as re-ordering of joins.
Review Comment:
I agree this being the case for ParquetExec / AvroExec etc where stats come
from file stats.
I see there isn't an ExecutionPlan which would generically wrap a
TableProvider... Not sure why we have this method then, if it cannot be used
directly in DF.
> However, for systems that have a good catalog / some other source of
statistical information, I could imagine doing the reordering at the logical
level
in fact having a metastore/catalog should not be uncommon. Many data
landscapes center around a catalog, eg when Iceberg is involved
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]