> On 11/07/2022, at 2:05 AM, Justin Pryzby <pry...@telsasoft.com> wrote:
> 
> On Sun, Jul 10, 2022 at 04:55:34PM +1200, Nathan Ward wrote:
>> I am running Postgres 13 on CentOS 7, installed from the yum.postgresql.org 
>> <http://yum.postgresql.org/> repo.
> 
> It doesn't sound relevant, but what kind of storage systems is postgres using 
> ?
> Filesystem, raid, device.

It’s an NVME SSD backed SAN over 2x16G fibre channel. The postgres server is in 
a VM (vmware). I can pretty comfortably do 10Gbit/s or more to the disk (I’ve 
only personally tested that, because initial replication from other sites runs 
at around that sort of speed limited by ethernet interfaces). The normal IO is 
between 1MB/s and maybe 15MB/s - writes only. Reads are pretty minimal.
FS is XFS.

> Is the high CPU use related to to autovacuum/autoanalyze ?

Good question - I don’t know. I’ve set the server to debug1 log level so I can 
see that - I see you have some notes below about autovacuum logs so I’ll see 
what that shows me.
Since setting that log level I haven’t yet had the issue occur - I watched it 
tick over midnight last night and it was normal (as it is, most days).

>> The issue I am having, is that when the daily data usage aggregation runs, 
>> sometimes we have a big performance impact, with the following 
>> characteristics which happen *after* the aggregation job runs in it usual 
>> fast time of 12s or so:
>> - The aggregation runs fast as per normal
>> - Load on the server goes to 30-40 - recall we have quite high “max 
>> connections” to keep throughput high when the client is far (16ms) from the 
>> server
> 
> I suggest to install and enable autoexplain to see what's running slowly here,
> and what its query plans are.  It seems possible that when the daily
> aggregation script drops the old partitions, the plan changes for the worse.
> I'm not sure what the fix is - maybe you just need to run vacuum or analyze on
> the new partitions soon after populating them.

Hmm, I’ll check it out. I hadn’t thought that the query planner could be doing 
something different, that’s a good point.

Note that the normal data ingest queries don’t hit the new partition - the new 
partition is for data 3 days ago, and the ingest queries only hit the 
partitions covering the last ~2 hours.

> For good measure, also set log_autovacuum_min_duration=0 (or something other
> than -1) (and while you're at it, log_checkpoints=on, and log_lock_waits=on if
> you haven't already).

Wilco.

> Note that postgres doesn't automatically analyze parent tables, so you should
> maybe do that whenever the data changes enough for it to matter.

Hmm. This raises some stuff I’m not familiar with - does analysing a parent 
table do anything? I got the impression that analysing the parent was just 
shorthand for analysing all of the attached partitions.

Perhaps because I attach a table with data, the parent sometimes decides it 
needs to run analyse on a bunch of things?
Or, maybe it uses the most recently attached partition, with bad statistics, to 
plan queries that only touch other partitions?

--
Nathan Ward



Reply via email to