Re: Tuple sketches question

leerho Wed, 20 May 2020 15:29:10 -0700

Hi David,

Thank you for reaching out to us.  We are always interested in learning
about new users and new uses of the library, especially with Tuple
sketches, which we do not hear much feedback about.   Let me try to address
some of your questions:

The Tuple Sketch is an "extension" of the Theta Sketch theoretically, but
not programmatically.  The underlying sketch algorithm of updating new
items, merging, and getting estimates and error bounds are identical
between the two sketches.  Their behaviour is identical with respect to
unique counting estimates and set operations.  However, they do not share
hardly any code.  The reason is because the Theta Sketch has been highly
optimized for performance, and rewriting it so that the Tuple Sketch could
be a programmatic extension, would impact the performance of the Theta
Sketch or greatly increase the complexity of the Theta Sketch code, which
is very heavily used.  It would be nice if we could, but we haven't had the
time to figure out a way to do that without extensive rewriting of both
sketches.

So currently, there is no mixing of Theta Sketch objects with Tuple Sketch
objects.  And, currently, there is no mechanism for easily converting Theta
Sketches to Tuple Sketches or visa versa.

Having said that, we have thought about having, for example, a Tuple Sketch
constructor that would accept a Theta Sketch as an argument and instantiate
the Tuple sketch with the same hash keys and *theta* as the Theta Sketch,
and empty (or default) Summary objects.

Another option we have thought about would be to allow a Tuple Sketch Union
to have a *update**(Theta/Sketch)* where the *Summary* was a default, and
perhaps specify the mode.

I'm not sure I understand the need to convert a Tuple Sketch back into a
Theta Sketch.  If you have a use-case for this,  please elaborate!

Quite honestly, we haven't seen very many requests for this, but with your
help perhaps we could collaborate and make these enhancements.

Cheers,

Lee.

On Wed, May 20, 2020 at 3:52 AM David Cromberge <
david.crombe...@permutive.com> wrote:

> Hi everyone,
>
> We have a significantly large dataset that is already represented by theta
> sketches, across a variety of dimensions.
> The theta sketch is chosen because the type of queries that are presented
> often involve many set operations, to filter down the data according to
> various dimension combinations.
> Recently, there is a compelling use case to adopt tuple sketches for some
> of the properties of the data.  These tuple sketches would also need to be
> filtered according to the same query criteria as the theta sketches.
> My initial assumption is that we would need to replicate all of our
> existing data as tuple sketches to accommodate this filtering.  Since tuple
> sketches are an extension of the theta idea, is it accurate to regard a
> tuple sketch where the summary mode
> Is ALWAYS_ONE as isomorphic the theta sketch, in accuracy and behaviour?
> Converting our entire dataset over to tuple sketches is not an option, and
> I also am unconvinced that a theta sketch can be easily converted to a
> tuple sketch on the fly, using the aforementioned summary property.
>
> Would anyone be able to verify my assumptions in this case, namely:
> - the theta sketch and tuple sketch can not be combined
> - the tuple sketch cannot be converted from and to a theta sketch
> - it is necessary to replicate an entire dataset into tuple sketches to
> use common set operations across the same dimensions
>
> Thank you for open-sourcing this library, and for any help with regard to
> the above,
> David
>
>

Re: Tuple sketches question

Reply via email to