Re: [HACKERS] Perf Benchmarking and regression.

Noah Misch Thu, 02 Jun 2016 22:58:23 -0700

On Wed, Jun 01, 2016 at 03:33:18PM -0700, Andres Freund wrote:
> On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> > On Fri, May 27, 2016 at 12:37 AM, Andres Freund <and...@anarazel.de> wrote:
> > > I don't think the situation is quite that simple. By *disabling* backend 
> > > flushing it's also easy to see massive performance regressions.  In 
> > > situations where shared buffers was configured appropriately for the 
> > > workload (not the case here IIRC).
> > 
> > On what kind of workload does setting backend_flush_after=0 represent
> > a large regression vs. the default settings?
> > 
> > I think we have to consider that pgbench and parallel copy are pretty
> > common things to want to do, and a non-zero default setting hurts
> > those workloads a LOT.
> 
> I don't think pgbench's workload has much to do with reality. Even less
> so in the setup presented here.
> 
> The slowdown comes from the fact that default pgbench randomly, but
> uniformly, updates a large table. Which is slower with
> backend_flush_after if the workload is considerably bigger than
> shared_buffers, but, and that's a very important restriction, the
> workload at the same time largely fits in to less than
> /proc/sys/vm/dirty_ratio / 20% (probably even 10% /
> /proc/sys/vm/dirty_background_ratio) of the free os memory.


Looking at some of the top hits for 'postgresql shared_buffers':

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
https://www.postgresql.org/docs/current/static/runtime-config-resource.html
http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html
https://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/
http://www.cybertec.at/2014/02/postgresql-9-3-shared-buffers-performance-1/

Choices mentioned (some in comments on a main post):

1. .25 * RAM
2. min(8GB, .25 * RAM)
3. Sizing procedure that arrived at 4GB for 900GB of data
4. Equal to data size

Thus, it is not outlandish to have the write portion of a working set exceed
shared_buffers while remaining under 10-20% of system RAM.  Choice (4) won't
achieve that, but (2) and (3) may achieve it given a mere 64 GiB of RAM.
Choice (1) can go either way; if read-mostly data occupies half of
shared_buffers, then writes passing through the other 12.5% of system RAM may
exhibit the property you describe.

Incidentally, a typical reason for a site to use low shared_buffers is to
avoid the latency spikes that *_flush_after combat:
https://www.postgresql.org/message-id/flat/4DDE2705020000250003DD4F%40gw.wicourts.gov

> > I have a really hard time believing that the benefits on other
> > workloads are large enough to compensate for the slowdowns we're
> > seeing here.
> 
> As a random example, without looking for good parameters, on my laptop:
> pgbench -i -q -s 1000
> 
> Cpu: i7-6820HQ
> Ram: 24GB of memory
> Storage: Samsung SSD 850 PRO 1TB, encrypted
> postgres -c shared_buffers=6GB -c backend_flush_after=128 -c 
> max_wal_size=100GB -c fsync=on -c synchronous_commit=off
> pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
> (note the -N)
> disabled:
> latency average = 2.774 ms
> latency stddev = 10.388 ms
> tps = 5761.883323 (including connections establishing)
> tps = 5762.027278 (excluding connections establishing)
> 
> 128:
> latency average = 2.543 ms
> latency stddev = 3.554 ms
> tps = 6284.069846 (including connections establishing)
> tps = 6284.184570 (excluding connections establishing)
> 
> Note the latency dev which is 3x better. And the improved throughput.

That is an improvement.  The workload is no less realistic than the ones
having shown regressions.

> Which means that transactional workloads that are bigger than the OS
> memory, or which have a non-uniform distribution leading to some
> locality, are likely to be faster. In practice those are *hugely* more
> likely than the uniform distribution that pgbench has.

That is formally true; non-benchmark workloads rarely issue uniform writes.
However, enough non-benchmark workloads have too little locality to benefit
from caches.  Those will struggle against *_flush_after like uniform writes
do, so discounting uniform writes wouldn't simplify this project.


Today's defaults for *_flush_after greatly smooth and accelerate performance
for one class of plausible workloads while greatly slowing a different class
of plausible workloads.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perf Benchmarking and regression.

Reply via email to