ozankabak commented on PR #19609: URL: https://github.com/apache/datafusion/pull/19609#issuecomment-3727520370
> If `evaluate_statistics()` and `propagate_statistics()` are intended to replace `evaluate_bounds()` and `propagate_constraints()`, and the latter two are no longer used inside the DataFusion core, should we simply deprecate them? They are not. The latter are the simplest building blocks that calculate ranges, the former are designed to use them to generate richer statistical information. The former are not used yet in the codebase, because we unfortunately didn't have time to work on it over the last 6 months or so, but have an intention to come back and finish the job. I didn't have time yet to look at this big PR, but I looked at the issue and design. As a general thought I think replacing the `evaluate_bounds`/`propagate_constraints` duo will not be easy. The `Interval` library makes very careful calculations w.r.t. things like rounding (with floats etc.), because their results are used to take branches. Support for such rounding-aware calculations were not present in `arrow-rs` at the time when we created these APIs (and still not here AFAICT). However, we have two things: 1. Rigor (w.r.t. containment) in bounds is a real need in some use cases (like join pruning). 2. Vectorization is a real need in other use cases (like Parquet pruning). A good first step is to add `evaluate_bounds_vectorized`, and note these nuances in the docs. Then, if you think it is possible to somehow achieve containment rigor w.r.t. rounding in the vectorized version too, we can try our hand at that, and then implement its `propagate_constraints` counterpart. Only at that stage we can remove the `evaluate_bounds`/`propagate_constraints` duo. Then, `evaluate_statistics()` and `propagate_statistics()` can migrate to the new API. However, my hunch is that this won't be possible for the time being, and we will need to a new API that does vectorized bounds, and make it clear in the docs that we need the two APIs to cater to 1 and 2. I hope this context helps @2010YOUY01 and @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
