Re: Apache Impala integration with DataSketches HLL (C++)

2020-04-27 Thread leerho
Hi Gabor, My quick question would be that taking into account that the order of the > items provided to datasketches:hll_sketch is not deterministic is it normal > behaviour that for the same dataset I get a different estimate each time I > run my query? > I'm trying to figure out if this is due t

Apache Impala integration with DataSketches HLL (C++)

2020-04-27 Thread Gabor Kaszab
Hey, I'm an Apache Impala (distributed, fast, SQL query engine on big data) contributor and recently started working on pulling in HLL sketching from DataSketches. I managed to put a PoC together where Impala runs a count(distinct) estimate on a column of a table where in the background it uses Da