On Mon, Jul 01, 2019 at 04:32:15PM -0700, Andres Freund wrote:
Hi,
On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote:
I think we should consider changing the effective_io_concurrency default
value, i.e. the guc that determines how many pages we try to prefetch in
a couple of places (the most important being Bitmap Heap Scan).
Maybe we need improve the way it's used / implemented instead - it seems
just too hard to determine the correct setting as currently implemented.
Sure, if we can improve those bits, that'd be nice. It's definitely hard
to decide what value is appropriate for a given storage system. But I'm
not sure it's something we can do easily, considering how opaque the
hardware is for us ...
I wonder
In some cases it helps a bit, but a bit higher value (4 or 8) performs
significantly better. Consider for example this "sequential" data set
from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what
fraction of pages matches the query):
I assume that the y axis is the time of the query?
The y-axis is the fraction of table matched by the query. The values in
the contingency table are query durations (average of 3 runs, but the
numbers vere very close).
How much data is this compared to memory available for the kernel to do
caching?
Multiple of RAM, in all cases. The queries were hitting random subsets of
the data, and the page cache was dropped after each test, to eliminate
cross-query caching.
pct 0 1 4 16 64 128
---------------------------------------------------------------
1 25990 18624 3269 2219 2189 2171
5 88116 60242 14002 8663 8560 8726
10 120556 99364 29856 17117 16590 17383
25 101080 184327 79212 47884 46846 46855
50 130709 309857 163614 103001 94267 94809
75 126516 435653 248281 156586 139500 140087
compared to the e_i_c=0 case, it looks like this:
pct 1 4 16 64 128
----------------------------------------------------
1 72% 13% 9% 8% 8%
5 68% 16% 10% 10% 10%
10 82% 25% 14% 14% 14%
25 182% 78% 47% 46% 46%
50 237% 125% 79% 72% 73%
75 344% 196% 124% 110% 111%
So for 1% of the table the e_i_c=1 is faster by about ~30%, but with
e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not
just on this storage system.
The e_i_c=1 can perform pretty poorly, especially when the query matches
large fraction of the table - for example in this example it's 2-3x
slower compared to no prefetching, and higher e_i_c values limit the
damage quite a bit.
I'm surprised the slowdown for small e_i_c values is that big - it's not
obvious to me why that is. Which os / os version / filesystem / io
scheduler / io scheduler settings were used?
This is the system with NVMe storage, and SATA RAID:
Linux bench2 4.19.26 #1 SMP Sat Mar 2 19:50:14 CET 2019 x86_64 Intel(R)
Xeon(R) CPU E5-2620 v4 @ 2.10GHz GenuineIntel GNU/Linux
/dev/nvme0n1p1 on /mnt/data type ext4 (rw,relatime)
/dev/md0 on /mnt/raid type ext4 (rw,relatime,stripe=48)
The other system looks pretty much the same (same kernel, ext4).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services