Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

Tomas Vondra Tue, 19 Jan 2016 13:50:02 -0800


On 01/19/2016 08:34 PM, Robert Haas wrote:

On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:

If this doesn't regress performance in the case where the number of
buckets is estimated accurately to begin with, then I think this is
a great idea. Can you supply some performance tests results for that
case, and maybe some of the other cases also?


I don't see how it could regress performance, and the benchmarks I've
done confirm that. I'll do more thorough benchmarking and post the
results here, but not now as this patch is in 2016-01 CF and I want to
put all my time into reviewing patches from the open commitfest.


I've finally got to do more thorough benchmarks, and surprisingly there
seems to be a minor regression.

...


So even if we entirely skipped the bucket build, we would not recover the
difference. Not sure what's going on here.

I've also done some profiling using perf, but I haven't really found
anything suspicious there.

Any ideas what might be the cause of this?


My guess is that memory locality is not as good with this patch.  Consider this:

copyTuple->next = hashtable->buckets[bucketno];

We've only just populated copytuple via memcpy, so that cache line is
presumably in whatever place cache lines go when they are really hot.
We basically make one pass over the allocated space for the hash
table, filling in the hash tuples and linking things into bucket
chains.  OTOH, with the patch, we don't do that on the first pass,
just writing the tuples.  Then when we come back through we still have
to do that part:

hashTuple->next = hashtable->buckets[bucketno];

By the time we're doing this, especially at larger work_mem settings,
we've probably pushed those cache lines out of the faster levels of
the CPU cache, and we have to pull them back in again, and that takes
time.   If the tuples are small, we'll dirty every cache line twice;
if they're larger, we might not dirty every cache line on the second
pass, but just some fraction of them.

I could be totally off base, but this is why I was concerned about the patch.

I can totally see why this would slow-down the BuildBuckets function,but I don't quite see why it should make the other code significantlyslower. Yet BuildBuckets takes just ~25ms while the total durationincreases by ~200ms (for the 1x10 dataset).

I do understand calling BuildBuckets may affect the code executed afterthat as it keeps other data in the cache, but i would not expect ~175ms.

Yet another thing is that BuildBuckets accesses the tuples in mostlysequential way, so I'd expect prefetch to take care of that.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

Reply via email to