Wow, I just realized I misstated things rather significantly.
We very much have tuple sketches in C++, but the python wrapper for them is
a work in progress. I thought I had it ready, but it turns out there are
some pretty significant limitations with the wrapper we're using (pybind11)
that I now
Yes, Druid does this on top of the specialized Tuple sketch called
ArrayOfDoublesSketch (in Java).
Each key in the sketch has an array of floating-point values associated
with it.
PostAggregator functions can convert these columns into means and
variances using org.apache.commons.math3.stat.descrip
Thanks yeah ! (tuple sketch and not theta as you said!).
I have another question please I looked at the tuple sketch I looked at:
https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html
and I see possible values of mode are: Sum, Mi
I believe you're looking at the tuple sketch code in java, not theta
sketch. We don't yet have tuple support in C++ (on which python is based).
It's planned, but I haven't yet had time to sit down and figure out how to
do it -- and specifically how to do so with a reasonable API.
jon
In python API I can do 'update_theta_sketch' and then get_estimate to get
the unique count.
But how can I get the sum in python for update_theta sketch?
I see it's available in non python dataksetch here:
public static enum Mode -->
/**
* The aggregation mode is the summation function.