Hello,

We'd like to propose a feature to disable WAL to speed up data loading.  This 
was inspired by a feature added in the latest MySQL.  I wish you won't fear 
this feature...


BACKGROUND
========================================

This branches off from [1] as mentioned therein.  Briefly speaking, a customer 
wants to shorten the time for nightly loading of data into their data warehouse 
as much as possible to be prepared for using the data warehouse for new things.

Currently, they are using Oracle's SQL*Loader with its no-logging feature.  
They want a similar feature to migrate to Postgres.  Other than the data 
loading performance, they don't want to be concerned about the storage for 
large volumes of WAL.

In [1], we thought about something like Oracle's per-table no-logging feature, 
but it seems difficult (or at least not easy.)  Meanwhile, I found another 
feature added in the latest MySQL 8.0.21 [2].  This proposal follows it almost 
directly.  That satisfies the customer request.

As an aside, it's also conceivable that in the near future, users could see the 
WAL bottleneck (WAL buffer or disk) when they utilize the parallel COPY that is 
being developed in the community.


FUNCTIONAL SPECIFICATION
========================================

Add a new value 'none' to the server configuration parameter wal_level.  With 
this setting:

* No WAL is emitted.

* The server refuses to start (pg_ctl start fails) after an abnormal shutdown 
due to power outage, pg_ctl's immediate shutdown, etc, showing a 
straightforward message like MySQL.

* Features like continuous archiving, pg_basebackup, and streaming/logical 
replication that requires wal_level >= replica are not available.

* The user can use all features again if you shut down the server successfully 
after data loading and reset wal_level to a value other than none.  He needs to 
take a base backup or rebuild the replication standby after restarting the 
server.


In addition to the cosmetic modifications to the manual articles that refer to 
wal_level, add a clause or paragraphs to the following sections to let users 
know the availability of this feature.

14.4. Populating a Database
18.6.1. Upgrading Data via pg_dumpall


PROGRAM DESIGN (main point only)
========================================

As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() 
does nothing and just returns a fixed value, which is the tail of the last 
shutdown checkpoint WAL record.  As a result, the value is set to the relation 
page header's LSN field.

In addition, it might be worth having XLogBeginInsert() and XLogRec...() to 
check wal_level and just return.  I don't expect much from this, but it may be 
interesting to give it a try to see the squeezed performance.

StartupXLOG() checks the wal_level setting in pg_control and quits the startup 
with ereport(FATAL) accordingly.


[1]
Implement UNLOGGED clause for COPY FROM
https://www.postgresql.org/message-id/osbpr01mb488887c0bdc5129c65dfc5e5ed...@osbpr01mb4888.jpnprd01.prod.outlook.com

[2]
Disabling Redo Logging
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging


Regards
Takayuki Tsunakawa



Reply via email to