Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

Andres Freund Thu, 29 Aug 2013 17:03:57 -0700

On 2013-08-30 02:53:54 +0300, Ants Aasma wrote:
> On Fri, Aug 30, 2013 at 1:30 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> > On 2013-08-30 01:10:40 +0300, Ants Aasma wrote:
> >> On Fri, Aug 30, 2013 at 12:33 AM, Andres Freund <and...@2ndquadrant.com> 
> >> wrote:
> >> > FWIW, WAL is still the major bottleneck for INSERT heavy workloads. The
> >> > per CPU overhead actually minimally increased (at least in my tests), it
> >> > just scales noticeably better than before.
> >>
> >> Interesting. Do you have any insight what is behind the CPU overhead?
> >> Maybe the solution is to make WAL insertion cheap enough to not
> >> matter. That won't be easy, but neither are the alternatives.
> >
> > Funnily by far the biggest thing I have seen in benchmarks is the CRC32
> > computation. I plan to brush up my ~3 year old CRC32 reimplementation
> > patch sometime, but afair you had a much better one?
> >
> > I have some doubts about weakening the hash function by also using FNV
> > or similar here, so I'd first like to try how much of a difference a
> > better CRC32 implementation can make with the current XLogInsert()
> > implementation.
> 
> The CRC32 implementations mostly differ by the amount of lookups that
> are done in parallel. Postgresql does 1 lookup, IIRC zlib
> implementation does 4, Intel has a paper that recommends going up to
> 8. The tradeoff is that each level requires a 4KB lookup table - for
> small records the additional cache misses will probably kill any
> speedup.
> 
> A quick overview of the hot cache large buffer performance of a few
> interesting options:
> [interesting data]


I am not sure "hot cache large buffer performance" is really the
interesting case. Most of the XLogInsert()s are pretty small in the
common workloads. I vaguely recall trying 8 and getting worse
performance on many workloads, but that might have been a problem of my
implementation.

The reason I'd like to go for a faster CRC32 implementation as a first
step is that it's easy. Easy to verify, easy to analyze, easy to
backout. I personally don't have enough interest/time in the 9.4 cycle
to purse conversion to a different algorithm (I find the idea of using
different ones on 32/64bit pretty bad), but I obviously won't stop
somebody else ;)

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

Reply via email to