findepi commented on code in PR #13454:
URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845463997


##########
datafusion/catalog/src/table.rs:
##########
@@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send {
     }
 
     /// Get statistics for this table, if available
+    /// Although not presently used in mainline DataFusion, this allows 
implementation specific
+    /// behavior for downstream repositories, in conjunction with specialized 
optimizer rules to
+    /// perform operations such as re-ordering of joins.

Review Comment:
   I agree this being the case for ParquetExec / AvroExec etc where stats come 
from file stats.
   I see there isn't an ExecutionPlan which would generically wrap a 
TableProvider... Not sure why we have this method then, if it cannot be used 
directly in DF. 
   
   > However, for systems that have a good catalog / some other source of 
statistical information, I could imagine doing the reordering at the logical 
level
   
    in fact having a metastore/catalog should not be uncommon. Many data 
landscapes center around a catalog, eg when Iceberg is involved
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to