ozankabak commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743175661
This API, as it currently stands, does not seem to make sense. It seems to make the assumption that outcomes (i.e. individual items in the range) of the `Distribution`s are equally likely, which is not necessarily the case. We can only merge two statistical objects in certain special circumstances. For example, if we have a statistical object that tracks sample averages along with counts, we can merge two instances of them. Our distributions are not merge-able quantities in this sense. They are *mixable* (with a given weight), but not *merge-able*. One of the follow-ups we previously discussed was adding a `HistogramDistribution` object that tracks bins and ranges. These objects will be merge-able. Therefore, we should start off by adding a `HistogramDistribution` object first. Then, we can add a `merge` API to that object. If you think we should have a `mix` API for the general `Distribution` object, we can add it too. Such an API will need to include a mixing weight in its signature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org