Have you looked at t-digests?

Calculating percentiles (including medians) is something that is inherently
difficult/inefficient to do in a distributed system. T-digests provide a
useful probabilistic structure to allow you to compute any percentile with a
known (and tunable) margin of error.

https://github.com/tdunning/t-digest




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/distributed-computation-of-median-tp21356p21357.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to