alamb commented on code in PR #13454:
URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845415426


##########
datafusion/catalog/src/table.rs:
##########
@@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send {
     }
 
     /// Get statistics for this table, if available
+    /// Although not presently used in mainline DataFusion, this allows 
implementation specific
+    /// behavior for downstream repositories, in conjunction with specialized 
optimizer rules to
+    /// perform operations such as re-ordering of joins.

Review Comment:
   I think the theory was that the ExecutionPlan (aka physical plan) had access 
to more specific information / statistics (like which files would be read, 
min/max values, etc) so doing join reordering could produce better results when 
done at that level
   
   This is why the current join selection is a physical optimizer pass 
https://github.com/apache/datafusion/blob/main/datafusion/core/src/physical_optimizer/join_selection.rs
   
   However, for systems that have a good catalog / some other source of 
statistical information, I could imagine doing the reordering at the logical 
level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to