Hi,

T-Digest (https://github.com/tdunning/t-digest) data-structure is another
way of computing sketches, rank based statistics and trimmed means over
numeric data. At my day job, we have been using a t-digest backed Druid
aggregator module which generally has been working out well for the use
cases of respective teams. I think it would be valuable to have T-Digest
backed aggregators in Druid along with other sketch algorithms like moments
and yahoo quantile sketches.

T-Digest has also been adopted by other projects including:

Elastic Search -
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html#search-aggregations-metrics-percentile-aggregation-approximation

stream-lib (
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/TDigest.java
)

Apache Mahout -
https://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-math/org/apache/mahout/math/stats/TDigest.html

I have been working on cleaning up and improving performance of the module
and would like to contribute it. I would like to see what does the
community think about it.

Thanks,
Samarth

Reply via email to