Re: speeding up planning with partitions

Tom Lane Sun, 31 Mar 2019 10:05:09 -0700

Amit Langote <amitlangot...@gmail.com> writes:
> On Sun, Mar 31, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i...@live.jp> 
> wrote:
>> Certainly, using bitmapset contributes to the performance when scanning
>> one partition(few partitions) from large partitions.


> Thanks Imai-san for testing.

I tried to replicate these numbers with the code as-committed, and
could not.  What I get, using the same table-creation code as you
posted and a pgbench script file like

\set param random(1, :N)
select * from rt where a = :param;

is scaling like this:

N       tps, range      tps, hash

2       10520.519932    10415.230400
8       10443.361457    10480.987665
32      10341.196768    10462.551167
128     10370.953849    10383.885128
512     10207.578413    10214.049394
1024    10042.794340    10121.683993
4096    8937.561825     9214.993778
8192    8247.614040     8486.728918

If I use "-M prepared" the numbers go up a bit for lower N, but
drop at high N:

N       tps, range      tps, hash

2       11449.920527    11462.253871
8       11530.513146    11470.812476
32      11372.412999    11450.213753
128     11289.351596    11322.698856
512     11095.428451    11200.683771
1024    10757.646108    10805.052480
4096    8689.165875     8930.690887
8192    7301.609147     7502.806455

Digging into that, it seems like the degradation with -M prepared is
mostly in LockReleaseAll's hash_seq_search over the locallock hash table.
What I think must be happening is that with -M prepared, at some point the
plancache decides to try a generic plan, which causes opening/locking all
the partitions, resulting in permanent bloat in the locallock hash table.
We immediately go back to using custom plans, but hash_seq_search has
more buckets to look through for the remainder of the process' lifetime.

I do see some cycles getting spent in apply_scanjoin_target_to_paths
that look to be due to scanning over the long part_rels array,
which your proposal would ameliorate.  But (a) that's pretty small
compared to other effects, and (b) IMO, apply_scanjoin_target_to_paths
is a remarkable display of brute force inefficiency to begin with.
I think we should see if we can't nuke that function altogether in
favor of generating the paths with the right target the first time.

BTW, the real elephant in the room is the O(N^2) cost of creating
these tables in the first place.  The runtime for the table-creation
scripts looks like

N       range           hash

2       0m0.011s        0m0.011s
8       0m0.015s        0m0.014s
32      0m0.032s        0m0.030s
128     0m0.132s        0m0.099s
512     0m0.969s        0m0.524s
1024    0m3.306s        0m1.442s
4096    0m46.058s       0m15.522s
8192    3m11.995s       0m58.720s

This seems to be down to the expense of doing RelationBuildPartitionDesc
to rebuild the parent's relcache entry for each child CREATE TABLE.
Not sure we can avoid that, but maybe we should consider adopting a
cheaper-to-read representation of partition descriptors.  The fact that
range-style entries seem to be 3X more expensive to load than hash-style
entries is strange.

                        regards, tom lane

Re: speeding up planning with partitions

Reply via email to