Re: HyperLogLog.c and pg_leftmost_one_pos32()

Jeff Davis Thu, 30 Jul 2020 09:22:13 -0700

On Wed, 2020-07-29 at 17:32 -0700, Peter Geoghegan wrote:
> How did you test this? What kind of difference are we talking about?


Essentially:
  initHyperLogLog(&hll, 5)
  for i in 0 .. one billion
    addHyperLogLog(&hll, hash(i))
  estimateHyperLogLog

The numbers are the same regardless of bwidth.

Before my patch, it takes about 15.6s. After my patch, it takes about
6.6s, so it's more than a 2X speedup (including the hash calculation).

As a part of HashAgg, when I test the worst case, it goes from a 4-5%
penalty to ~1% (within noise).

> I think that you should change back the rhs() variable names to match
> HyperLogLog upstream (as well as the existing rhs() comments).

Done.

> > I think it's overall an improvement, but
> > addHyperLogLog() itself seemed to show up as a cost, so it can hurt
> > spilled-but-still-in-memory cases. I'd also like to backpatch this
> > to
> > 13 (as I already did for 9878b643), if that's acceptable.
> 
> I still wonder if it was ever necessary to add HLL to abbreviated
> keys. It only served to avoid some pretty narrow worse cases, at the
> expense of typical cases. Given that the existing users of HLL are
> pretty narrow, and given the importance of preserving the favorable
> performance characteristics of hash aggregate, I'm inclined to agree
> that it's worth backpatching to 13 now. Assuming it is a really
> measurable cost in practice.

Yes, the difference (at least in a tight loop, on my machine) is not
subtle. I went ahead and committed and backpatched.

Regards,
        Jeff Davis

Re: HyperLogLog.c and pg_leftmost_one_pos32()

Reply via email to