Josh;

I'd like to explain what the term "compression" in my proposal means again and would like to show the resource consumption comparision with cp and gzip.

My proposal is to remove unnecessary full page writes (they are needed in crash recovery from inconsistent or partial writes) when we copy WAL to archive log and rebuilt them as a dummy when we restore from archive log. Dummy is needed to maintain LSN. So it is very very different from general purpose compression such as gzip, although pg_compresslog compresses archive log as a result.

As to CPU and I/O consumption, I've already evaluated as follows:

1) Collect all the WAL segment.
2) Copy them by different means, cp, pg_compresslog and gzip.

and compared the ellapsed time as well as other resource consumption.

Benchmark: DBT-2
Database size: 120WH (12.3GB)
Total WAL size: 4.2GB (after 60min. run)
Elapsed time:
  cp:            120.6sec
  gzip:          590.0sec
  pg_compresslog: 79.4sec
Resultant archive log size:
  cp:             4.2GB
  gzip:           2.2GB
  pg_compresslog: 0.3GB
Resource consumption:
  cp:   user:   0.5sec system: 15.8sec idle:  16.9sec I/O wait: 87.7sec
  gzip: user: 286.2sec system:  8.6sec idle: 260.5sec I/O wait: 36.0sec
  pg_compresslog:
        user:   7.9sec system:  5.5sec idle:  37.8sec I/O wait: 28.4sec

Because the resultant log size is considerably smaller than cp or gzip, pg_compresslog need much less I/O and because the logic is much simpler than gzip, it does not consume CPU.

The term "compress" may not be appropriate. We may call this "log optimization" instead.

So I don't see any reason why this (at least optimization "mark" in each log record) can't be integrated.

Simon Riggs wrote:
On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:

OK, different question:
Why would anyone ever set full_page_compress = off?
The only reason I can see is if compression costs us CPU but gains RAM & I/O. I can think of a lot of applications ... benchmarks included ... which are CPU-bound but not RAM or I/O bound. For those applications, compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller file processing, then I'd agree that we don't need a switch.

As I wrote to Simon's comment, I concern only one thing.

Without a switch, because both full page writes and corresponding logical log is included in WAL, this will increase WAL size slightly (maybe about five percent or so). If everybody is happy with this, we don't need a switch.


Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.


Here're some candidates:
full_page_writes_optimize
full_page_writes_mark: means it marks full_page_write as "needed in crash recovery", "needed in archive recovery" and so on.

I don't insist these names. It's very helpful if you have any suggestion to reflect what it really means.

Regards;
--
Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to