Good morning,

This week we've noticed that we're starting to see spikes where COMMITs are
taking much longer than usual. Sometimes, quite a few seconds to finish.
After a few minutes they disappear but then return seemingly at random.
This becomes visible to the app and end user as a big stall in activity.

The checkpoints are still running for their full 5 min checkpoint_timeout
duration (logs all say "checkpoint starting: time" and I'm not seeing any
warnings about them occurring too frequently.

This is PostgreSQL 12.4 on Ubuntu 18.04, all running in MS Azure (*not*
managed by them).

# select version();
 PostgreSQL 12.4 (Ubuntu 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

I have the stats_temp_directory in a tmpfs mount. I *do* have pg_wal on the
same premium SSD storage volume as the data directory. Normally I would
know to separate these but I was told with the cloud storage that it's all
virtualized anyway, plus storage IOPS are determined by disk size so having
a smaller volume just for pg_wal would hurt me in this case. The kind folks
in the PG community Slack suggested just having one large premium cloud
storage mount for the data directory and leave pg_wal inside because this
virtualization removes any guarantee of true separation.

I'm wondering if others have experience running self-managed PG in a cloud
setting (especially if in MS Azure) and what they might have seen/done in
cases like this.


Don Seiler

Reply via email to