Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

Tomas Vondra Mon, 18 Jan 2016 19:58:58 -0800

Hi,

On 12/17/2015 10:28 PM, Tomas Vondra wrote:

Hi,


On 12/17/2015 07:20 PM, Robert Haas wrote:
...


If this doesn't regress performance in the case where the number of
buckets is estimated accurately to begin with, then I think this is
a great idea. Can you supply some performance tests results for that
case, and maybe some of the other cases also?


I don't see how it could regress performance, and the benchmarks I've
done confirm that. I'll do more thorough benchmarking and post the
results here, but not now as this patch is in 2016-01 CF and I want to
put all my time into reviewing patches from the open commitfest.

I've finally got to do more thorough benchmarks, and surprisingly there seems to be a minor regression. The scripts I've used are attached, along with results, but in short it joins two tables, with different combinations:


  1M x 10M
  10M x 100M
  5M x 250M

with different work_mem sizes (so that the smallest value uses a bunch of batches while the largest value uses a single batch).

The 1x10 and 10x100 are designed to fit into RAM on the machine (including the temporary files in batching mode), while 5x250 is designed to specifically force the temp files to disk (but it's on fast SSD array, so not a big deal).

Average timings for current master and the first two patches of the series (0001 postpones the build of buckets and 0002 always starts without batching) look like this (the timings are in milliseconds):


    size   work_mem    master    postpone   no-batch
 -----------------------------------------------------
    1x10          8      5735      5760         6107
                 32      5939      5955         6124
                128      4852      5038         5175
 -----------------------------------------------------
   5x250         64    158512    158429       159008
                256    144046    144584       144994
               1024    111478    117529       116933
 -----------------------------------------------------
  10x100         64     68389     68278        68530
                256     66693     66415        66605
               1024     48606     50654        50490

So compared to master, the differences look like this:

    size   work_mem    postpone   no-batch
  -----------------------------------------
    1x10          8       0.44%      6.50%
                 32       0.28%      3.13%
                128       3.83%      6.65%
  -----------------------------------------
   5x250         64      -0.05%      0.31%
                256       0.37%      0.66%
               1024       5.43%      4.89%
  -----------------------------------------
   10x100        64      -0.16%      0.21%
                256      -0.42%     -0.13%
               1024       4.21%      3.88%

So for the first patch (postponing building of buckets) with batching, the difference is rather negligible, possibly within noise (~0.5%). When the hash join runs with a single batch, the difference however gets much more significant 4-5%.

I'm not going to discuss the second patch for now, but naturally it has the same issue and apparently also some additional ones (at least on the 1x10 data set).

I have spent a fair amount of time profiling this, and I can't wrap my head around where the difference comes from, though. Essentially we don't do any additional stuff, we've just moved some of the stuff to a different place in the code.

It might change the memory access pattern a bit, but the original patterns are not exactly sequential or so, as we're moving random updates to array of pointers. Or rather moving them to the BuildBuckets() method, so if something consumes more time, it should be in this method, I believe.

So I've measured how much time the function takes for the 1x10 and 10x100 datasets, and it's ~25ms and ~290ms respectively. That's way less than the actual difference in query runtime, which is ~190ms and ~2050ms.

So even if we entirely skipped the bucket build, we would not recover the difference. Not sure what's going on here.

I've also done some profiling using perf, but I haven't really found anything suspicious there.


Any ideas what might be the cause of this?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

hash-scripts.tgz
Description: application/compressed-tar

hashes-delayed.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

Reply via email to