ozankabak commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743175661

   This API, as it currently stands, does not seem to make sense. It seems to 
make the assumption that outcomes (i.e. individual items in the range) of the 
`Distribution`s are equally likely, which is not necessarily the case.
   
   We can only merge two statistical objects in certain special circumstances. 
For example, if we have a statistical object that tracks sample averages along 
with counts, we can merge two instances of them. Our distributions are not 
merge-able quantities in this sense. They are *mixable* (with a given weight), 
but not *merge-able*.
   
   One of the follow-ups we previously discussed was adding a 
`HistogramDistribution` object that tracks bins and ranges. These objects will 
be merge-able. Therefore, we should start off by adding a 
`HistogramDistribution` object first. Then, we can add a `merge` API to that 
object.
   
   If you think we should have a `mix` API for the general `Distribution` 
object, we can add it too. Such an API will need to include a mixing weight in 
its signature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to