Hi David, Thank you for reaching out to us. We are always interested in learning about new users and new uses of the library, especially with Tuple sketches, which we do not hear much feedback about. Let me try to address some of your questions:
The Tuple Sketch is an "extension" of the Theta Sketch theoretically, but not programmatically. The underlying sketch algorithm of updating new items, merging, and getting estimates and error bounds are identical between the two sketches. Their behaviour is identical with respect to unique counting estimates and set operations. However, they do not share hardly any code. The reason is because the Theta Sketch has been highly optimized for performance, and rewriting it so that the Tuple Sketch could be a programmatic extension, would impact the performance of the Theta Sketch or greatly increase the complexity of the Theta Sketch code, which is very heavily used. It would be nice if we could, but we haven't had the time to figure out a way to do that without extensive rewriting of both sketches. So currently, there is no mixing of Theta Sketch objects with Tuple Sketch objects. And, currently, there is no mechanism for easily converting Theta Sketches to Tuple Sketches or visa versa. Having said that, we have thought about having, for example, a Tuple Sketch constructor that would accept a Theta Sketch as an argument and instantiate the Tuple sketch with the same hash keys and *theta* as the Theta Sketch, and empty (or default) Summary objects. Another option we have thought about would be to allow a Tuple Sketch Union to have a *update**(Theta/Sketch)* where the *Summary* was a default, and perhaps specify the mode. I'm not sure I understand the need to convert a Tuple Sketch back into a Theta Sketch. If you have a use-case for this, please elaborate! Quite honestly, we haven't seen very many requests for this, but with your help perhaps we could collaborate and make these enhancements. Cheers, Lee. On Wed, May 20, 2020 at 3:52 AM David Cromberge < david.crombe...@permutive.com> wrote: > Hi everyone, > > We have a significantly large dataset that is already represented by theta > sketches, across a variety of dimensions. > The theta sketch is chosen because the type of queries that are presented > often involve many set operations, to filter down the data according to > various dimension combinations. > Recently, there is a compelling use case to adopt tuple sketches for some > of the properties of the data. These tuple sketches would also need to be > filtered according to the same query criteria as the theta sketches. > My initial assumption is that we would need to replicate all of our > existing data as tuple sketches to accommodate this filtering. Since tuple > sketches are an extension of the theta idea, is it accurate to regard a > tuple sketch where the summary mode > Is ALWAYS_ONE as isomorphic the theta sketch, in accuracy and behaviour? > Converting our entire dataset over to tuple sketches is not an option, and > I also am unconvinced that a theta sketch can be easily converted to a > tuple sketch on the fly, using the aforementioned summary property. > > Would anyone be able to verify my assumptions in this case, namely: > - the theta sketch and tuple sketch can not be combined > - the tuple sketch cannot be converted from and to a theta sketch > - it is necessary to replicate an entire dataset into tuple sketches to > use common set operations across the same dimensions > > Thank you for open-sourcing this library, and for any help with regard to > the above, > David > >