rroelke opened a new issue, #13439:
URL: https://github.com/apache/datafusion/issues/13439

   The extent of the documentation for 
[`TableProvider::statistics`](https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.statistics)
 in version 43.0.0 is:
   ```
   Get statistics for this table, if available
   ```
   
   This offers no explanation as to how the statistics will or will not be used.
   
   A user with experience in analytical database engines writing a custom 
`TableProvider` implementation may suspect that `TableProvider::statistics` is 
used by the datafusion query optimizer to determine join orders, perhaps among 
other things.
   
   However, this conclusion is apparently incorrect, which I deduce from the 
following pieces of evidence:
   1) I am a user fitting that description and found that my custom 
`TableProvider::statistics` was not called in the presence of a join query
   2) `cargo check --workspace --tests` runs with no errors if I remove the `fn 
statistics` declaration from the `trait TableProvider` definition
   3) having found the source code for the rule which changes join orders it is 
clear that it calls `ExecutionPlan::statistics` instead.
   
   
   ### Expectation
   
   The documentation should set appropriate expectations for what 
`TableProvider::statistics` is used for, so that developers can make informed 
choices about whether or not to implement it.
   
   ### Additional context
   
   The apparent answer to what `TableProvider::statistics` is used for is 
"nothing" based on the `cargo check --workspace --tests` comment above, but 
removing the trait method is a breaking change.  Based on the slack discussion 
prior to filing this issue, at least one user is depending on 
`TableProvider::statistics` for their custom optimizer rules and removing it 
would require them to find a workaround.
   
   Short of deprecating or removing the trait method, I would personally be 
satisfied just with updates to the method documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to