On Wed, 2020-07-29 at 17:32 -0700, Peter Geoghegan wrote: > How did you test this? What kind of difference are we talking about?
Essentially: initHyperLogLog(&hll, 5) for i in 0 .. one billion addHyperLogLog(&hll, hash(i)) estimateHyperLogLog The numbers are the same regardless of bwidth. Before my patch, it takes about 15.6s. After my patch, it takes about 6.6s, so it's more than a 2X speedup (including the hash calculation). As a part of HashAgg, when I test the worst case, it goes from a 4-5% penalty to ~1% (within noise). > I think that you should change back the rhs() variable names to match > HyperLogLog upstream (as well as the existing rhs() comments). Done. > > I think it's overall an improvement, but > > addHyperLogLog() itself seemed to show up as a cost, so it can hurt > > spilled-but-still-in-memory cases. I'd also like to backpatch this > > to > > 13 (as I already did for 9878b643), if that's acceptable. > > I still wonder if it was ever necessary to add HLL to abbreviated > keys. It only served to avoid some pretty narrow worse cases, at the > expense of typical cases. Given that the existing users of HLL are > pretty narrow, and given the importance of preserving the favorable > performance characteristics of hash aggregate, I'm inclined to agree > that it's worth backpatching to 13 now. Assuming it is a really > measurable cost in practice. Yes, the difference (at least in a tight loop, on my machine) is not subtle. I went ahead and committed and backpatched. Regards, Jeff Davis