Hi,

I agree that pg_compresslog should be aware of all the WAL records' details so that it can optimize archive log safely. In my patch, I've examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe.

Also I agree further pg_compresslog maintenance needs to examine changes in WAL record format. Because the number of such format will be limited, I think the amount of work will be reasonable enough.

Regards;

Simon Riggs wrote:
On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:
"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes:
But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the uncompressed size of WAL).
It was that part I questioned.

I think its right to question it, certainly.

That's what bothers me about this patch, too.  It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

I don't really understand this concern. Koichi-san has included a
parameter setting that would prevent any change at all in the way WAL is
written. If you don't want this slight increase in WAL, don't enable it.
If you do enable it, you'll also presumably be compressing the xlog too,
which works much better than gzip using less CPU. So overall it saves
more than it costs, ISTM, and nothing at all if you choose not to use
it.

The main arguments that I could see against Andreas' alternative are:

1. Some WAL record types are arranged in a way that actually would not
permit the reconstruction of the short form from the long form, because
they throw away too much data when the full-page image is substituted.
An example that's fresh in my mind is that the current format of the
btree page split WAL record discards newitemoff in that case, so you
couldn't identify the inserted item in the page image.  Now this is only
saving two bytes in what's usually going to be a darn large record
anyway, and it complicates the code to do it, so I wouldn't cry if we
changed btree split to include newitemoff always.  But there might be
some other cases where more data is involved.  In any case, someone
would have to look through every single WAL record type to determine
whether reconstruction is possible and fix it if not.

2. The compresslog utility would have to have specific knowledge about
every compressible WAL record type, to know how to convert it to the
short format.  That means an ongoing maintenance commitment there.
I don't think this is unacceptable, simply because we need only teach
it about a few common record types, not everything under the sun ---
anything it doesn't know how to fix, just leave alone, and if it's an
uncommon record type it really doesn't matter.  (I guess that means
that we don't really have to do #1 for every last record type, either.)

So I don't think either of these is a showstopper.  Doing it this way
would certainly make the patch more acceptable, since the argument that
it might hurt rather than help performance in some cases would go away.

Yeh, its additional code paths, but it sounds like Koichi-san and
colleagues are going to be trail blazing any bugs there and will be
around to fix any more that emerge.

What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.
I don't care for that, as it pretty much destroys some of the more
important sanity checks that xlog replay does.  The page boundaries
need to match the records contained in them.  So I think we do need
to have pg_decompresslog insert dummy WAL entries to fill up the
space saved by omitting full pages.

Agreed. I don't want to start touching something that works so well.


We've been thinking about doing this for at least 3 years now, so I
don't see any reason to baulk at it now. I'm happy with Koichi-san's
patch as-is, assuming further extensive testing will be carried out on
it during beta.



--
-------------
Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to