alamb commented on code in PR #13454:
URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845415426
##########
datafusion/catalog/src/table.rs:
##########
@@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send {
}
/// Get statistics for this table, if available
+ /// Although not presently used in mainline DataFusion, this allows
implementation specific
+ /// behavior for downstream repositories, in conjunction with specialized
optimizer rules to
+ /// perform operations such as re-ordering of joins.
Review Comment:
I think the theory was that the ExecutionPlan (aka physical plan) had access
to more specific information / statistics (like which files would be read,
min/max values, etc) so doing join reordering could produce better results when
done at that level
This is why the current join selection is a physical optimizer pass
https://github.com/apache/datafusion/blob/main/datafusion/core/src/physical_optimizer/join_selection.rs
However, for systems that have a good catalog / some other source of
statistical information, I could imagine doing the reordering at the logical
level
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]