Do you know of any python implementation for the same?
thanks
pavan
On 4/17/17, 9:54 AM, svjk24 wrote:
Hello,
Is there any interest in an efficient distributed computation of the
median algorithm?
A google search pulls some stackoverflow discussion but it would be
good to have one provided.
Also q-tree is implemented in algebird, not hard to get it going in spark.
That is another probabilistic data structure that is useful for this.
On Apr 17, 2017 11:27, "Jason White" wrote:
> Have you looked at t-digests?
>
> Calculating percentiles (including medians) is something that is inhere
The DataFrame API includes an approximate quartile implementation. If you
ask for quantile 0.5, you will get approximate median.
On Sun, Apr 16, 2017 at 9:24 PM svjk24 wrote:
> Hello,
> Is there any interest in an efficient distributed computation of the
> median algorithm?
> A google search
Have you looked at t-digests?
Calculating percentiles (including medians) is something that is inherently
difficult/inefficient to do in a distributed system. T-digests provide a
useful probabilistic structure to allow you to compute any percentile with a
known (and tunable) margin of error.
http