Re: HyperLogLog.c and pg_leftmost_one_pos32()

Jeff Davis Thu, 30 Jul 2020 11:25:43 -0700

On Thu, 2020-07-30 at 19:16 +0200, Tomas Vondra wrote:
> > Essentially:
> >  initHyperLogLog(&hll, 5)
> >  for i in 0 .. one billion
> >    addHyperLogLog(&hll, hash(i))
> >  estimateHyperLogLog
> > 
> > The numbers are the same regardless of bwidth.
> > 
> > Before my patch, it takes about 15.6s. After my patch, it takes
> > about
> > 6.6s, so it's more than a 2X speedup (including the hash
> > calculation).
> > 
> 
> Wow. That's a huge improvements.


To be clear: the 2X+ speedup was on the tight loop test.

> How does the whole test (data + query) look like? Is it particularly
> rare / special case, or something reasonable to expect in practice?

The whole-query test was:

config:
  shared_buffers=8GB
  jit = off
  max_parallel_workers_per_gather=0

setup:
  create table t_1m_20(i int);
  vacuum (freeze, analyze) t_1m_20;
  insert into t_1m_20 select (random()*1000000)::int4
    from generate_series(1,20000000);

query:
  set work_mem='2048kB';
  SELECT pg_prewarm('t_1m_20', 'buffer');

  -- median of the three runs
  select distinct i from t_1m_20 offset 10000000;
  select distinct i from t_1m_20 offset 10000000;
  select distinct i
from t_1m_20 offset 10000000;

results:
  f2130e77 (before using HLL):          6787ms 
  f1af75c5 (before my recent commit):   7170ms
  fd734f38 (master now):                6990ms

My previous results before I committed the patch (and therefore not on
the same exact SHA1s) were 6812, 7158, and 6898. So my most recent
batch of results is slightly worse, but the most recent commit
(fd734f38) still does show an improvement of a couple percent.

Regards,
        Jeff Davis

Re: HyperLogLog.c and pg_leftmost_one_pos32()

Reply via email to