Hello Andres,

I don't want to post a full series right now, but my working state is
available on
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush
git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush

Below the results of a lot of tests with pgbench to exercise checkpoints on the above version when fetched.

Overall comments:
 - sorting & flushing is basically always a winner
 - benchmarking with short runs on large databases is a bad idea
   the results are very different if a longer run is used
   (see andres00b vs andres00c)

# HOST/SOFT

 16 GB 2 cpu 8 cores
 200 GB RAID1 HDD, ext4 FS
 Ubuntu 12.04 LTS (precise)

# ABOUT THE REPORTED STATISTICS

 tps: is the "excluding connection" time tps, the higher the better
 1-sec tps: average of measured per-second tps
   note - it should be the same as the previous one, but due to various
          hazards in the trace, especially when things go badly and pg get
          stuck, it may be different. Such hazard also explain why there
          may be some non-integer tps reported for some seconds.
 stddev: standard deviation, the lower the better
 the five figures in bracket give a feel of the distribution:
 - min: minimal per-second tps seen in the trace
 - q1: first quarter per-second tps seen in the trace
 - med: median per-second tps seen in the trace
 - q3: third quarter per-second tps seen in the trace
 - max: maximal per-second tps seen in the trace
 the last percentage dubbed "<=10.0" is percent of seconds where performance
   is below 10 tps: this measures of how unresponsive pg was during the run

###### TINY2

 pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4
   with scale = 10 (~ 200 MB)

 postgresql.conf:
   shared_buffers = 1GB
   max_wal_size = 1GB
   checkpoint_timeout = 300s
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 2574.1 / 2574.3 ± 367.4 [229.0, 2570.1, 2721.9, 2746.1, 2857.2] 0.0%
      1 | 2575.0 / 2575.1 ± 359.3 [  1.0, 2595.9, 2712.0, 2732.0, 2847.0] 0.1%
      2 | 2602.6 / 2602.7 ± 359.5 [ 54.0, 2607.1, 2735.1, 2768.1, 2908.0] 0.0%

    0 0 | 2583.2 / 2583.7 ± 296.4 [164.0, 2580.0, 2690.0, 2717.1, 2833.8] 0.0%
      1 | 2596.6 / 2596.9 ± 307.4 [296.0, 2590.5, 2707.9, 2738.0, 2847.8] 0.0%
      2 | 2604.8 / 2605.0 ± 300.5 [110.9, 2619.1, 2712.4, 2738.1, 2849.1] 0.0%

   32 0 | 2625.5 / 2625.5 ± 250.5 [  1.0, 2645.9, 2692.0, 2719.9, 2839.0] 0.1%
      1 | 2630.2 / 2630.2 ± 243.1 [301.8, 2654.9, 2697.2, 2726.0, 2837.4] 0.0%
      2 | 2648.3 / 2648.4 ± 236.7 [570.1, 2664.4, 2708.9, 2739.0, 2844.9] 0.0%

   64 0 | 2587.8 / 2587.9 ± 306.1 [ 83.0, 2610.1, 2680.0, 2731.0, 2857.1] 0.0%
      1 | 2591.1 / 2591.1 ± 305.2 [455.9, 2608.9, 2680.2, 2734.1, 2859.0] 0.0%
      2 | 2047.8 / 2046.4 ± 925.8 [  0.0, 1486.2, 2592.6, 2691.1, 3001.0] 0.2% ?

Pretty small setup, all data fit in buffers. Good tps performance all around
(best for 32 flushes), and flushing shows a noticable (360 -> 240) reduction
in tps stddev.

###### SMALL

 pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4
   with scale = 120 (~ 2 GB)

 postgresql.conf:
   shared_buffers = 2GB
   checkpoint_timeout = 300s
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 209.2 / 204.2 ± 516.5 [0.0,   0.0,   4.0,    5.0, 2251.0] 82.3%
      1 | 207.4 / 204.2 ± 518.7 [0.0,   0.0,   4.0,    5.0, 2245.1] 82.3%
      2 | 217.5 / 211.0 ± 530.3 [0.0,   0.0,   3.0,    5.0, 2255.0] 82.0%
      3 | 217.8 / 213.2 ± 531.7 [0.0,   0.0,   4.0,    6.0, 2261.9] 81.7%
      4 | 230.7 / 223.9 ± 542.7 [0.0,   0.0,   4.0,    7.0, 2282.0] 80.7%

    0 0 | 734.8 / 735.5 ± 879.9 [0.0,   1.0,  16.5, 1748.3, 2281.1] 47.0%
      1 | 694.9 / 693.0 ± 849.0 [0.0,   1.0,  29.5, 1545.7, 2428.0] 46.4%
      2 | 735.3 / 735.5 ± 888.4 [0.0,   0.0,  12.0, 1781.2, 2312.1] 47.9%
      3 | 736.0 / 737.5 ± 887.1 [0.0,   1.0,  16.0, 1794.3, 2317.0] 47.5%
      4 | 734.9 / 735.1 ± 885.1 [0.0,   1.0,  15.5, 1781.0, 2297.1] 47.2%

   32 0 | 738.1 / 737.9 ± 415.8 [0.0, 553.0, 679.0,  753.0, 2312.1]  0.2%
      1 | 730.5 / 730.7 ± 413.2 [0.0, 546.5, 671.0,  744.0, 2319.0]  0.1%
      2 | 741.9 / 741.9 ± 416.5 [0.0, 556.0, 682.0,  756.0, 2331.0]  0.2%
      3 | 744.1 / 744.1 ± 414.4 [0.0, 555.5, 685.2,  758.0, 2285.1]  0.1%
      4 | 746.9 / 746.9 ± 416.6 [0.0, 566.6, 685.0,  759.0, 2308.1]  0.1%

   64 0 | 743.0 / 743.1 ± 416.5 [1.0, 555.0, 683.0,  759.0, 2353.0]  0.1%
      1 | 742.5 / 742.5 ± 415.6 [0.0, 558.2, 680.0,  758.2, 2296.0]  0.1%
      2 | 742.5 / 742.5 ± 415.9 [0.0, 559.0, 681.1,  757.0, 2310.0]  0.1%
      3 | 529.0 / 526.6 ± 410.9 [0.0, 245.0, 444.0,  701.0, 2380.9]  1.5% ??
      4 | 734.8 / 735.0 ± 414.1 [0.0, 550.0, 673.0,  754.0, 2298.0]  0.1%

Sorting brings * 3.3 tps, flushing significantly reduces tps stddev.
Pg comes from 80% unresponsive to nearly always responsive.

###### MEDIUM

 pgbench:  -M prepared -N -P 1 -T 4000 -j 2 -c 4
   with scale = 250 (~ 3.8 GB)

 postgresql.conf:
   shared_buffers = 4GB
   max_wal_size = 4GB
   checkpoint_timeout = 15min
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 |  214.8 /  211.8 ± 513.7 [0.0,   1.0,    4.0,    5.0, 2344.0] 82.4%
      1 |  219.2 /  215.0 ± 524.1 [0.0,   0.0,    4.0,    5.0, 2316.0] 82.2%
      2 |  240.9 /  234.6 ± 550.8 [0.0,   0.0,    4.0,    6.0, 2320.2] 81.0%

    0 0 | 1064.7 / 1065.3 ± 888.2 [0.0,  11.0, 1089.0, 2017.7, 2461.9] 24.7%
      1 | 1060.2 / 1061.2 ± 889.9 [0.0,  10.0, 1056.7, 2022.0, 2444.9] 25.1%
      2 | 1060.2 / 1061.4 ± 889.1 [0.0,   9.0, 1085.8, 2002.8, 2473.0] 25.6%

   32 0 | 1059.4 / 1059.4 ± 476.3 [3.0, 804.9,  980.0, 1123.0, 2448.1]  0.1%
      1 | 1062.5 / 1062.6 ± 475.6 [0.0, 807.0,  988.0, 1132.0, 2441.0]  0.1%
      2 | 1063.7 / 1063.7 ± 475.4 [0.0, 814.0,  987.0, 1131.2, 2432.1]  0.1%

   64 0 | 1052.6 / 1052.6 ± 475.3 [0.0, 793.0,  974.0, 1118.1, 2445.1]  0.1%
      1 | 1059.8 / 1059.8 ± 475.1 [0.0, 799.0,  987.5, 1131.0, 2457.1]  0.1%
      2 | 1058.5 / 1058.5 ± 472.8 [0.0, 807.0,  985.0, 1127.7, 2442.0]  0.1%

Sorting brings * 4.8 tps, flushing significantly reduces tps stddev.
Pg comes from +80% unresponsive to nearly always responsive.

Performance is significantly better than "small" above, probably thanks to
the longer checkpoint timeout.


###### LARGE

 pgbench -M prepared -N -P 1 -T 7500 -j 2 -c 4
   with scale = 1000 (~ 15 GB)

 postgresql.conf:
   shared_buffers = 4GB
   max_wal_size = 2GB
   checkpoint_timeout = 40min
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64}

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 |  68.7 /  65.3 ± 78.6 [0.0,  3.0,   6.0, 136.0, 291.0] 53.1%
      1 |  70.6 /  70.3 ± 80.1 [0.0,  4.0,  10.0, 151.0, 282.0] 50.1%
      2 |  74.3 /  75.8 ± 84.9 [0.0,  4.0,   9.0, 162.0, 311.2] 50.3%

    0 0 | 117.2 / 116.9 ± 83.8 [0.0, 14.0, 139.0, 193.0, 372.4] 24.0%
      1 | 117.3 / 117.8 ± 83.8 [0.0, 16.0, 140.0, 193.0, 279.0] 23.9%
      2 | 117.6 / 118.2 ± 84.1 [0.0, 16.0, 141.0, 194.0, 297.8] 23.7%

   32 0 | 114.2 / 114.2 ± 45.7 [0.0, 84.0, 100.0, 131.0, 613.6]  0.4%
      1 | 112.5 / 112.6 ± 44.0 [0.0, 83.0,  98.0, 130.0, 293.0]  0.2%
      2 | 108.0 / 108.0 ± 44.7 [0.0, 79.0,  94.0, 124.0, 303.6]  0.3%

   64 0 | 113.0 / 113.0 ± 45.5 [0.0, 83.0,  99.0, 131.0, 289.0]  0.4%
      1 |  80.0 /  80.3 ± 39.1 [0.0, 56.0,  72.0,  95.0, 281.0]  0.8% ??
      2 | 112.2 / 112.3 ± 44.5 [0.0, 82.0,  99.0, 129.0, 282.0]  0.3%

Data do not fit in the available memory, so plenty of read accesses.
Sorting still has some impact on tps performance (* 1.6), flushing
greatly improves responsiveness.


###### ANDRES00

 pgbench -M prepared -N -P 1 -T 300 -c 16 -j 16
   with scale = 800 (~ 13 GB)

 postgresql.conf:
   shared_buffers = 2GB
   max_wal_size = 100GB
   wal_level = hot_standby
   maintenance_work_mem = 2GB
   checkpoint_timeout = 30s
   checkpoint_completion_target = 0.8
   synchronous_commit = off
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 328.7 / 329.9 ± 716.9 [0.0, 0.0,   0.0,    0.0, 3221.2] 77.7%
      1 | 338.2 / 338.7 ± 728.6 [0.0, 0.0,   0.0,   17.0, 3296.3] 75.0%
      2 | 304.5 / 304.3 ± 705.5 [0.0, 0.0,   0.0,    0.0, 3463.4] 79.3%

    0 0 | 425.6 / 464.0 ± 724.0 [0.0, 0.0,   0.0, 1000.6, 3363.7] 61.0%
      1 | 461.5 / 463.1 ± 735.8 [0.0, 0.0,   0.0, 1011.2, 3490.9] 58.7%
      2 | 452.4 / 452.6 ± 744.3 [0.0, 0.0,   0.0, 1078.9, 3631.9] 63.3%

   32 0 | 514.4 / 515.8 ± 651.8 [0.0, 0.0, 337.4,  808.3, 2876.0] 40.7%
      1 | 512.0 / 514.6 ± 661.6 [0.0, 0.0, 317.6,  690.8, 3315.8] 35.0%
      2 | 529.5 / 530.3 ± 673.0 [0.0, 0.0, 321.1,  906.4, 3360.8] 40.3%

   64 0 | 529.6 / 530.9 ± 668.2 [0.0, 0.0, 322.1,  786.1, 3538.0] 33.3%
      1 | 496.4 / 498.0 ± 606.6 [0.0, 0.0, 321.4,  746.0, 2629.6] 36.3%
      2 | 521.0 / 521.7 ± 657.0 [0.0, 0.0, 328.4,  737.9, 3262.9] 34.3%

Data just hold in memory, maybe. Run is very short, settings are low, this
is not representative of an sane installation, this is for testing a lot of
checkpoints in a difficult situation. Sorting and flushing do bring
significant benefits.


###### ANDRES00b (same as ANDRES00 but scale 800->1000)

 pgbench -M prepared -N -P 1 -T 300 -c 16 -j 16
   with scale = 1000 (~ 15 GB)

 postgresql.conf:
  shared_buffers = 2GB
  max_wal_size = 100GB
  wal_level = hot_standby
  maintenance_work_mem = 2GB
  checkpoint_timeout = 30s
  checkpoint_completion_target = 0.8
  synchronous_commit = off
  checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 150.2 / 150.3 ± 401.6 [0.0,   0.0,   0.0,   0.0, 2199.4] 75.1%
      1 | 139.2 / 139.2 ± 372.2 [0.0,   0.0,   0.0,   0.0, 2111.4] 78.3% ***
      2 | 127.3 / 127.1 ± 341.2 [0.0,   0.0,   0.0,  53.0, 2144.3] 74.7% ***

    0 0 | 199.0 / 209.2 ± 400.4 [0.0,   0.0,   0.0, 243.6, 1846.0] 65.7%
      1 | 220.4 / 226.7 ± 423.2 [0.0,   0.0,   0.0, 264.0, 1777.0] 63.5% *
      2 | 195.5 / 205.3 ± 337.9 [0.0,   0.0, 123.0, 212.0, 1721.9] 43.2%

   32 0 | 362.3 / 359.0 ± 308.4 [0.0, 200.0, 265.0, 416.4, 1816.6]  5.0%
      1 | 323.6 / 321.2 ± 327.1 [0.0, 142.9, 210.0, 353.4, 1907.0]  4.0%
      2 | 309.0 / 310.7 ± 381.3 [0.0, 122.0, 175.5, 298.0, 2090.4]  5.0%

   64 0 | 342.7 / 343.6 ± 331.1 [0.0, 143.0, 239.5, 409.9, 1623.6]  5.3%
      1 | 333.8 / 328.2 ± 356.3 [0.0, 132.9, 211.5, 358.1, 1629.1] 10.7% ??
      2 | 352.0 / 352.0 ± 332.3 [0.0, 163.5, 239.9, 400.1, 1643.4]  5.3%

A little bit larger than previous so that it does not really fit in memory.
The performance inpact is significant compared to previous. Sorting and
flushing brings * 2 tps, unresponsiveness comes from 75% to reach a better 5%.

###### ANDRES00c (same as ANDRES00b but time 300 -> 4000)

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 115.2 / 114.3 ± 256.4 [0.0,   0.0,  75.0, 131.1, 3389.0] 46.5%
      1 | 118.4 / 117.9 ± 248.3 [0.0,   0.0,  87.0, 151.0, 3603.6] 46.7%
      2 | 120.1 / 119.2 ± 254.4 [0.0,   0.0,  91.0, 143.0, 3307.8] 43.8%

    0 0 | 217.4 / 211.0 ± 237.1 [0.0, 139.0, 193.0, 239.0, 3115.4] 16.8%
      1 | 216.2 / 209.6 ± 244.9 [0.0, 138.9, 188.0, 231.0, 3331.3] 16.3%
      2 | 218.6 / 213.8 ± 246.7 [0.0, 137.0, 187.0, 232.0, 3229.6] 16.2%

   32 0 | 146.6 / 142.5 ± 234.5 [0.0,  59.0,  93.0, 151.1, 3294.7] 17.5%
      1 | 148.0 / 142.6 ± 239.2 [0.0,  64.0,  95.9, 144.0, 3361.8] 16.0%
      2 | 147.6 / 140.4 ± 233.2 [0.0,  59.4,  94.0, 148.0, 3108.4] 18.0%

   64 0 | 145.3 / 140.5 ± 233.6 [0.0,  61.0,  93.0, 147.7, 3212.6] 16.5%
      1 | 145.6 / 140.3 ± 233.3 [0.0,  58.0,  93.0, 146.0, 3351.8] 17.3%
      2 | 147.7 / 142.2 ± 233.2 [0.0,  61.0,  97.0, 148.4, 3616.3] 17.0%

The only difference between ANDRES00B and ANDRES00C is the duration, from
5 minutes to 66 minutes. This show that short runs can be widelely misleading:
In particular the longer runs shows less that half tps for some settings, and
the relative comparison of head vs sort vs sort+flush is different.

###### ANDRES00d (same as ANDRES00b but wal_level hot_standby->minimal)

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 191.6 / 195.1 ± 439.3 [0.0,   0.0,   0.0,   0.0, 2540.2] 76.3%
      1 | 211.3 / 213.6 ± 461.9 [0.0,   0.0,   0.0,  13.0, 3203.7] 75.0%
      2 | 152.4 / 154.9 ± 217.6 [0.0,   0.0,  58.0, 235.6,  995.9] 39.3% ???

    0 0 | 247.2 / 251.7 ± 454.0 [0.0,   0.0,   0.0, 375.3, 2592.4] 67.7%
      1 | 215.4 / 232.7 ± 446.5 [0.0,   0.0,   0.0, 103.0, 3046.7] 72.3%
      2 | 160.6 / 160.8 ± 222.1 [0.0,   0.0,  80.0, 209.6,  885.3] 42.0% ???

   32 0 | 399.9 / 397.0 ± 356.6 [0.0,  67.0, 348.0, 572.8, 2604.2] 21.0%
      1 | 391.8 / 392.5 ± 371.7 [0.0,  85.5, 314.4, 549.3, 2590.3] 20.7%
      2 | 406.1 / 404.8 ± 380.6 [0.0,  95.0, 348.5, 569.0, 3383.7] 21.3%

   64 0 | 395.9 / 396.1 ± 352.4 [0.0,  89.5, 342.5, 556.0, 2366.9] 17.7%
      1 | 355.1 / 351.9 ± 296.7 [0.0, 172.5, 306.1, 468.1, 1663.5] 16.0%
      2 | 403.6 / 401.8 ± 390.5 [0.0,   0.0, 337.0, 636.1, 2591.3] 26.7% ???

###### ANDRES00e (same as ANDRES00b but maintenance_work_mem=2GB->64MB)

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 153.5 / 161.3 ± 401.3 [0.0,   0.0,   0.0,   0.0, 2546.0] 82.0%
      1 | 170.7 / 175.9 ± 399.9 [0.0,   0.0,   0.0,  14.0, 2537.4] 74.7%
      2 | 184.7 / 190.4 ± 389.2 [0.0,   0.0,   0.0, 158.5, 2544.6] 69.3%

    0 0 | 211.2 / 227.8 ± 418.8 [0.0,   0.0,   0.0, 334.6, 2589.3] 65.7%
      1 | 221.7 / 226.0 ± 415.7 [0.0,   0.0,   0.0, 276.8, 2588.2] 68.4%
      2 | 232.5 / 233.2 ± 403.5 [0.0,   0.0,   0.0, 377.0, 2260.2] 62.0%

   32 0 | 373.2 / 374.4 ± 309.2 [0.0, 180.6, 321.8, 475.2, 2596.5] 11.3%
      1 | 348.7 / 348.1 ± 328.4 [0.0, 127.0, 284.1, 451.9, 2595.1] 17.3%
      2 | 376.3 / 375.3 ± 315.5 [0.0, 186.5, 329.6, 487.1, 2365.4] 15.3%

   64 0 | 388.9 / 387.8 ± 348.7 [0.0, 164.0, 305.9, 546.5, 2587.2] 15.0%
      1 | 380.3 / 378.7 ± 338.8 [0.0, 171.1, 317.4, 524.8, 2592.4] 16.7%
      2 | 369.8 / 367.4 ± 340.5 [0.0,  77.4, 320.6, 525.5, 2484.7] 20.7%

Hmmm, interesting: maintenance_work_mem seems to have some influence on
performance, although it is not too consistent between settings, probably
because as the memory is used to its limit the performance is quite
sensitive to the available memory.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to