Hi, Amit. Thank you for your reply.

> I didn't load any data into tables when running the tests, because these 
> patches are meant for reducing planner latency.  More specifically, they're 
> addressed to fix the current planner behavior that its latency increases with 
> increasing number of partitions, with focus on the common case where only a 
> single partition will need to be scanned by a given query.

Thank you for telling me details. It is very helpful.

> No need to try to measure the INSERT latency, because its planning latency is 
> not affected by the number of partitions.  Moreover, I'd rather suggest you 
> take out the INSERT statement from the benchmark for now, because its 
> execution time does vary unfavorably with increasing number of partitions.

Thank you for your advice.

> In fact, just measure non-prepared tps and latency, because we're only 
> interested in planning time here.

I got it.

> Hmm, since 0001 is meant to improve update planning time, it seems odd that 
> you'd get poorer results compared to base source.  But, it seems base source 
> is actually master + the patch to improve the execution time of update, so 
> maybe that patch is playing a part here, although I'm not sure why even 
> that's making this much of a difference.
> I suggest that you use un-patched master as base source, that is, leave out 
> any patches to improve execution time.

> By the way, as you may have noticed, I posted a version 2 of the patches on 
> this thread.  If you apply them, you will be be able to see almost same TPS 
> for any number of partitions with master + 0001 + 0002 + 0003.

> I guess if you had applied the latest version of 
> "Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch" (which is v8 posted 
> at [1]) on top of master + 0001 + 0002 + 0003, you'd get better performance 
> too.

Thank you. I will try out these cases.

Thanks,

--
Sho Kato
-----Original Message-----
From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp] 
Sent: Monday, September 3, 2018 2:12 PM
To: Kato, Sho/加藤 翔 <kato-...@jp.fujitsu.com>; Pg Hackers 
<pgsql-hack...@postgresql.org>
Subject: Re: speeding up planning with partitions

Thank you Kato-san for testing.

On 2018/08/31 19:48, Kato, Sho wrote:
> Hi, Amit
> 
> Great!
> With the total number of records being 6400, I benchmarked while increasing 
> the number of partitions from 100 to 6400.
> Applying three all patches, 20% performance improved for 100 partitions.
> 
> I have the same number of records for each partition, do you do the same?

I didn't load any data into tables when running the tests, because these 
patches are meant for reducing planner latency.  More specifically, they're 
addressed to fix the current planner behavior that its latency increases with 
increasing number of partitions, with focus on the common case where only a 
single partition will need to be scanned by a given query.

I'd try to avoid using a benchmark whose results is affected by anything other 
than the planning latency.  It will be a good idea if you leave the tables 
empty and just vary the number of partitions and nothing else.

Also, we're interested in planning latency, so using just SELECT and UPDATE in 
your benchmark script will be enough, because their planning time is affected 
by the number of partitions.  No need to try to measure the INSERT latency, 
because its planning latency is not affected by the number of partitions.  
Moreover, I'd rather suggest you take out the INSERT statement from the 
benchmark for now, because its execution time does vary unfavorably with 
increasing number of partitions.  Sure, there are other patches to address 
that, but it's better not to mix the patches to avoid confusion.

> Also, in my case, performance was better when not prepare.

Patches in this thread do nothing for the execution, so, there is no need to 
compare prepared vs non-prepared.  In fact, just measure non-prepared tps and 
latency, because we're only interested in planning time here.

> I think these patches do not improve execute case, so we need faster runtime 
> pruning patch[1], right?

We already have run-time pruning code (that is the code in the patch you
linked) committed into the tree in PG 11, so the master tree also has it.
But since we're not interested in execution time, no need to worry about 
run-time pruning.

> Details of measurement conditions and results are as follows.
> - base source
>   master(@777e6ddf17) + Speeding up Insert v8 patch[1]
>
> - table definition(e.g. 100 partition)
>   create table test.accounts(aid serial, abalance int)
>   partition by range(aid);
>   create table test.accounts_history(id serial, aid int, delta int,
>   mtime timestamp without time zone)
>   partition by range(aid);
>
> - command option
>   pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare
>   pgbench -d testdb -f benchmark.pgbench -T 180 -r -n
>   
> -results
>   base source no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 662.414805 |          0.357 |          0.265 |          0.421
>         200 | 494.478431 |          0.439 |          0.349 |          0.579
>         400 | 307.982089 |          0.651 |          0.558 |          0.927
>         800 | 191.360676 |          0.979 |          0.876 |          1.548
>        1600 |  75.344947 |          2.253 |          2.003 |          4.301
>        3200 |  30.643902 |          5.716 |          4.955 |         10.118
>        6400 |  16.726056 |         12.512 |          8.582 |         18.054
> 
>   0001 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 429.816329 |          0.811 |           0.75 |          0.365
>         200 | 275.211531 |          1.333 |          1.248 |          0.501
>         400 | 160.499833 |          2.384 |          2.252 |          0.754
>         800 |  79.387776 |          4.935 |          4.698 |          1.468
>        1600 |  24.787377 |         16.593 |         15.954 |          4.302
>        3200 |   9.846421 |          42.96 |         42.139 |          8.848
>        6400 |   4.919772 |          87.43 |         83.109 |          16.56

Hmm, since 0001 is meant to improve update planning time, it seems odd that 
you'd get poorer results compared to base source.  But, it seems base source is 
actually master + the patch to improve the execution time of update, so maybe 
that patch is playing a part here, although I'm not sure why even that's making 
this much of a difference.

I suggest that you use un-patched master as base source, that is, leave out any 
patches to improve execution time.

[ ... ]

>   0001 + 0002 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 |  682.53091 |          0.388 |           0.35 |           0.35
>         200 | 469.906601 |          0.543 |          0.496 |           0.51
>         400 | 321.915349 |           0.78 |          0.721 |          0.752
>         800 | 201.620975 |          1.246 |          1.156 |          1.236
>        1600 |  94.438204 |          2.612 |          2.335 |          2.745
>        3200 |  38.292922 |          6.657 |          5.579 |          6.808
>        6400 |   21.48462 |         11.989 |         10.104 |         12.601
> 

[ ... ]

>        
>   0001 + 0002 + 0003 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 798.962029 |          0.304 |          0.267 |          0.339
>         200 | 577.893396 |          0.384 |          0.346 |          0.487
>         400 | 426.542177 |          0.472 |          0.435 |          0.717
>         800 | 288.616213 |           0.63 |          0.591 |          1.162
>        1600 | 154.247034 |          1.056 |          0.987 |          2.384
>        3200 |  59.711446 |          2.416 |          2.233 |          6.514
>        6400 |  37.109761 |          3.387 |          3.099 |         11.762
>        

[ ... ]

By the way, as you may have noticed, I posted a version 2 of the patches on 
this thread.  If you apply them, you will be be able to see almost same TPS for 
any number of partitions with master + 0001 + 0002 + 0003.

>   Although it may not be related to this, when measured with pg11 beta2, 
> somehow the performance was better.
> 
>   11beta2 + v1-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch[3] 
> prepared
>    part_num |   tps_ex    | update_latency | select_latency | insert_latency 
>   ----------+-------------+----------------+----------------+----------------
>      100    | 756.07228   |          0.942 |          0.091 |          0.123

I guess if you had applied the latest version of 
"Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch" (which is v8 posted at 
[1]) on top of master + 0001 + 0002 + 0003, you'd get better performance too.  
But, as mentioned above, we're interested in measuring planning latency, not 
execution latency, so we should leave out any patches that are meant toward 
improving execution latency to avoid confusion.


Thanks again.

Regards,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f9T_32Xpb-p8cWwo5ezSfVhXviUW8QTWncP8ksPHDRK8g%40mail.gmail.com


Reply via email to