On Mon, Jul 01, 2019 at 04:32:15PM -0700, Andres Freund wrote:
Hi,

On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote:
I think we should consider changing the effective_io_concurrency default
value, i.e. the guc that determines how many pages we try to prefetch in
a couple of places (the most important being Bitmap Heap Scan).

Maybe we need improve the way it's used / implemented instead - it seems
just too hard to determine the correct setting as currently implemented.


Sure, if we can improve those bits, that'd be nice. It's definitely hard
to decide what value is appropriate for a given storage system. But I'm
not sure it's something we can do easily, considering how opaque the
hardware is for us ...

I wonder

In some cases it helps a bit, but a bit higher value (4 or 8) performs
significantly better. Consider for example this "sequential" data set
from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what
fraction of pages matches the query):

I assume that the y axis is the time of the query?


The y-axis is the fraction of table matched by the query. The values in
the contingency table are query durations (average of 3 runs, but the
numbers vere very close).

How much data is this compared to memory available for the kernel to do
caching?


Multiple of RAM, in all cases. The queries were hitting random subsets of
the data, and the page cache was dropped after each test, to eliminate
cross-query caching.


   pct         0         1        4         16        64       128
   ---------------------------------------------------------------
     1     25990     18624      3269      2219      2189      2171
     5     88116     60242     14002      8663      8560      8726
    10    120556     99364     29856     17117     16590     17383
    25    101080    184327     79212     47884     46846     46855
    50    130709    309857    163614    103001     94267     94809
    75    126516    435653    248281    156586    139500    140087

compared to the e_i_c=0 case, it looks like this:

   pct       1        4         16        64       128
   ----------------------------------------------------
     1     72%      13%         9%        8%        8%
     5     68%      16%        10%       10%       10%
    10     82%      25%        14%       14%       14%
    25    182%      78%        47%       46%       46%
    50    237%     125%        79%       72%       73%
    75    344%     196%       124%      110%      111%

So for 1% of the table the e_i_c=1 is faster by about ~30%, but with
e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not
just on this storage system.

The e_i_c=1 can perform pretty poorly, especially when the query matches
large fraction of the table - for example in this example it's 2-3x
slower compared to no prefetching, and higher e_i_c values limit the
damage quite a bit.

I'm surprised the slowdown for small e_i_c values is that big - it's not
obvious to me why that is.  Which os / os version / filesystem / io
scheduler / io scheduler settings were used?


This is the system with NVMe storage, and SATA RAID:

Linux bench2 4.19.26 #1 SMP Sat Mar 2 19:50:14 CET 2019 x86_64 Intel(R)
Xeon(R) CPU E5-2620 v4 @ 2.10GHz GenuineIntel GNU/Linux

/dev/nvme0n1p1 on /mnt/data type ext4 (rw,relatime)
/dev/md0 on /mnt/raid type ext4 (rw,relatime,stripe=48)

The other system looks pretty much the same (same kernel, ext4).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to