Dear Jeff,

thanks for your answer.

Your question regarding CPUs pointed me into the right direction now. In my 
container virtualization I had the actual CPU restriction set to 2 so this 
explains the drop in performance (d'oh!). Actually, with using UNLOGGED tables 
I get constant wall-clock time up to 4 processes, when removing this, I get a 
small penalty for 2 and 4 jobs.

As I can recover from broken bulk imports on these tables on the application 
level, I will recreate them as UNLOGGED and take home that performance gain.

Thanks you for your time and expertise!

Best wishes,
Manuel

________________________________
From: Jeff Janes [jeff.ja...@gmail.com]
Sent: Tuesday, August 27, 2019 16:45
To: Holtgrewe, Manuel
Subject: [ext] Re: Pointers towards identifying bulk import bottleneck 
(walwriter tuning?)

On Tue, Aug 27, 2019 at 6:06 AM Holtgrewe, Manuel 
<manuel.holtgr...@bihealth.de<mailto:manuel.holtgr...@bihealth.de>> wrote:
Dear all,

I hope that this is the right place to ask.

I have trouble identifying the bottleneck in a bulk import. I'm suspecting the 
bottleneck to be the walwriter. I'm using PostgreSQL 11, my configuration is at 
[5]. I'm running on ZFS raidz3 that can write write 300MB/s+. However, I'm only 
seeing ~35MB/s from walwriter.


What happens if you turn fsync=off (test only---do not do that on a production 
environment)?  If that doesn't speed things up dramatically, then what happens 
if you make the partitions of variants_smallvariant be unlogged tables?  If 
that doesn't speed things up dramatically either, then you know the bottleneck 
has nothing to do with WAL writing.

What does "top" show?

Sample the contents of wait_event_type and wait_event from pg_stat_activity for 
the  INSERT statements.  What are they waiting on most?

iotop tells me that walwriter does not go beyond ~35MB/s so maybe this is the 
culprit? Is there a way to tune walwriter I/O performance?


WALwriter is not the exclusive writer of WAL records.  For example the 
user-connected backends also write WAL records.  If you want to know how fast 
WAL is being generated, you should look directly at the amount of WAL generated 
over the course of the benchmark, for example by looking at 
pg_current_wal_lsn() before and after, or looking at the number of wal segments 
getting archived.

You can mess around with wal_writer_delay and wal_writer_flush_after, but I 
think that is not likely to make much difference.  You can certainly shove the 
burden of doing the writes back and forth between WALwriter and user backends, 
but the total bottleneck is unlikely to change much.

> With one import process the CPU usage is at 80%, dropping to ~60% for two 
> import processes, and to about 30% for four import processes.

How many CPUs do you have?  Is that 80% of all your CPU, or 80% of just one of 
them?  Is the rest going to IO wait, system, or idle?

Cheers,

Jeff

Reply via email to