Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-10 Thread Grega Kešpret
type=pdf > by Bruce Lindsay, IBM > > http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf > > > > > > On Mon, Apr 6, 2015 at 12:50 AM, Grega Kešpret wrote: > >> Hi! >> >> I'd like to get community's opinion on implementing

Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-04-06 Thread Grega Kešpret
hub.com/tdunning/t-digest>, implement the serialization/deserialization boilerplate and provide def cdf(x: Double): Double def quantile(q: Double): Double on RDD[Double] and RDD[(K, Double)]. Let me know what you think. Any other ideas/suggestions also welcome! Best, Grega -- [image: Inline i