Hi!

I'd like to get community's opinion on implementing a generic quantile
approximation algorithm for Spark that is O(n) and requires limited memory.
I would find it useful and I haven't found any existing implementation. The
plan was basically to wrap t-digest <https://github.com/tdunning/t-digest>,
implement the serialization/deserialization boilerplate and provide

def cdf(x: Double): Double
def quantile(q: Double): Double


on RDD[Double] and RDD[(K, Double)].

Let me know what you think. Any other ideas/suggestions also welcome!

Best,
Grega
--
[image: Inline image 1]*Grega Kešpret*
Senior Software Engineer, Analytics

Skype: gregakespret
celtra.com <http://www.celtra.com/> | @celtramobile
<http://www.twitter.com/celtramobile>

Reply via email to