Hi! I'd like to get community's opinion on implementing a generic quantile approximation algorithm for Spark that is O(n) and requires limited memory. I would find it useful and I haven't found any existing implementation. The plan was basically to wrap t-digest <https://github.com/tdunning/t-digest>, implement the serialization/deserialization boilerplate and provide
def cdf(x: Double): Double def quantile(q: Double): Double on RDD[Double] and RDD[(K, Double)]. Let me know what you think. Any other ideas/suggestions also welcome! Best, Grega -- [image: Inline image 1]*Grega Kešpret* Senior Software Engineer, Analytics Skype: gregakespret celtra.com <http://www.celtra.com/> | @celtramobile <http://www.twitter.com/celtramobile>