From: Osumi, Takamichi/大墨 昂道 <osumi.takami...@fujitsu.com>
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already and is formatted

I think make check-world uses the default setting for wal_level.  You need to 
set wal_level = none and do make installcheck-world.


> I compared two wal_levels both 'minimal' and 'none'.
> For both levels, I measured
> (1) cluster's restore from pg_dumpall output,
> (2) COPY initial data to a defined table as initial data loading, and
> (3) COPY data to a defined table with tuples, 3 times each for all cases.
> After that, calculated the average and the ratio of the loading speed.
> The conclusion is that wal_level=none cuts about 20% of the loading speed
> compared to 'minimal' in the three cases above.

Hmm.  I wonder why pg_dumpall's performance increases by as much as 20%.  On 
the contrary to my previous hope, pg_dumpall uses COPY to restore data, so it 
doesn't emit WAL when wal_level = minimal.  (Is it brought by the difference of 
whether DDL's WAL is emitted or not?)


> Sharing other scenario to measure is welcome.

How about running multiple concurrent data loading sessions when adding data to 
existing tables with data, so that WAL is the bottleneck?  That's the use case 
of the target customer, isn't it?



> The input data was generated from pgbench with 1000 scale factor.
> It's about 9.3GB. For the table definition or the initial data for appended 
> data

IIRC, I thought the scale factor of 1,000 is 1.5 GB.  What displayed 9.3 GB?  
SELECT pg_database_size() or something?


Below are review comments:


(1)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
                return EndPos;
        }
 
+       /* Issues WAL only for transaction end and check point */
+       if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+       {
+               XLogResetInsertion();
+               return GetLatestCheckPointLSN();
+       }
+
        do

This does not emit transaction completion WAL records.  Their RM ID is 
RM_XACT_ID.  Also, RM_XLOG_ID includes other kinds of WAL records than the 
checkpoint WAL record.  Correct the comment accordingly.  I don't have a good 
idea on how to represent the RM_XLOG_ID,, but the following might help:

[rmgrlist.h]
/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, 
NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, 
NULL, NULL)


(2)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
@@ -2591,10 +2591,10 @@ include_dir 'conf.d'
         data to support WAL archiving and replication, including running
         read-only queries on a standby server. <literal>minimal</literal> 
removes all
         logging except the information required to recover from a crash or
-        immediate shutdown.  Finally,
+        immediate shutdown.  <literal>none</literal> generates no WAL in any 
case. Finally,

According to the previous code, "none" emits some kinds of WAL records.  So I 
think we need to come up with a good name and/or description.



(3)
         <literal>logical</literal> adds information necessary to support 
logical
-        decoding.  Each level includes the information logged at all lower
-        levels.  This parameter can only be set at server start.
+        decoding.  Each level except for <literal>none</literal> includes the
+        information logged at all lower levels.  This parameter can only be 
set at server start.
        </para>

Why is this change necessary?


(4)
+        On the other hand, an unexpected crash of the server makes the 
database cluster
+        inconsistent. For that reason, before utilizing this level, get a full 
backup of the cluster and
+        backup of the entire operations that are done under the condition that
+        <varname>wal_level</varname> is <literal>none</literal>.

This gives the impression that the user can start the database server and see 
inconsistent data.  The reality is that the database server does not start, 
isn't it?


(5)
@@ -1751,7 +1752,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND 
somethingelse;
     Aside from avoiding the time for the archiver or WAL sender to process the
     WAL data, doing this will actually make certain commands faster, because
     they do not to write WAL at all if <varname>wal_level</varname>
-    is <literal>minimal</literal> and the current subtransaction (or top-level
+    is either <literal>minimal</literal> or <literal>minimal</literal>
+    and the current subtransaction (or top-level
     transaction) created or truncated the table or index they change.  (They
     can guarantee crash safety more cheaply by doing
     an <function>fsync</function> at the end than by writing WAL.)

This is not correct.  In minimal, some conditions need to hold true for WAL to 
not be generated as described above.  OTOH, wal_level = none does not generate 
WAL unconditionally.


(6)
14.4.9. Some Notes about pg_dump
...
If using WAL archiving or streaming replication, consider disabling them during 
the restore. To do that, set archive_mode to off, wal_level to minimal, and 
max_wal_senders to zero before loading the dump. Afterwards, set them back to 
the right values and take a fresh base backup.

Why don't you refer to wal_level = none here as well?


(7)
@@ -918,10 +918,13 @@ PostmasterMain(int argc, char *argv[])
                                         ReservedBackends, MaxConnections);
                ExitPostmaster(1);
        }
-       if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == 
WAL_LEVEL_MINIMAL)
+       if ((XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_NONE) 
||
+               (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == 
WAL_LEVEL_MINIMAL))
                ereport(ERROR,
-                               (errmsg("WAL archival cannot be enabled when 
wal_level is \"minimal\"")));
-       if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+                               (errmsg("WAL archival cannot be enabled when 
wal_level is \"%s\"",
+                                               wal_level == WAL_LEVEL_MINIMAL 
? "minimal" : "none")));
+       if ((max_wal_senders > 0 && wal_level == WAL_LEVEL_NONE) ||
+               (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL))
                ereport(ERROR,
                                (errmsg("WAL streaming (max_wal_senders > 0) 
requires wal_level \"replica\" or \"logical\"")));

This style of writing conitions is redundant.  You can just change the code to 
wal_level <= WAL_LEVEL_MINIMAL.
Also, the first message can be '"minimal" or "none"' like the second one.


(8)
@@ -989,6 +992,12 @@ PostmasterMain(int argc, char *argv[])
        LocalProcessControlFile(false);
 
        /*
+        * Check some conditions specific to wal_level='none' and ensures the
+        * database isn't inconsistent.
+        */
+       SafelyFinishedNoneLevel();
+
+       /*

This check should be moved to around the beginning of StartupXLOG().  
PostmasterMain() is called in multi-user mode.  It's not called in single-user 
mode (postgres --single).
The new function is not necessary.  StartupXLOG() can see the control file 
contents directly.



(9)
+       /*
+        * Detect if we previously crashed under wal_level='none' or not.
+        */
+       unexpected_shutdown = ControlFile->state != DB_SHUTDOWNED &&
+               ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY;
+       if ((ControlFile->wal_level == WAL_LEVEL_NONE && unexpected_shutdown))
+       {
+               ereport(ERROR,
+                               (errmsg("Server was unexpectedly shut down when 
WAL logging was disabled"),
+                                errhint("It looks like you need to deploy a 
new cluster from your full backup again.")));
+       }
+}

Refine the message according to the following message guideline.  For example, 
the primary message has to start with a lower case letter.

https://www.postgresql.org/docs/devel/source.html



(10)
You need to edit this to add none:
src/backend/utils/misc/postgresql.conf.sample


(11)
src/include/access/xlogdefs.h
src/backend/access/transam/varsup.c

Consider modifying the comments in these files that refer to wal_level.  Maybe 
wal_level <= minimal is enough?


(12)
src/include/utils/rel.h

Modify the RelationNeedsWAL() so that it returns false when wal_level = none.  
Adding wal_level != WAL_LEVEL_NONE is would probably be okay.


(13)
@@ -161,7 +161,8 @@ extern int  XLogArchiveMode;
 /* WAL levels */
 typedef enum WalLevel
 {
-       WAL_LEVEL_MINIMAL = 0,
+       WAL_LEVEL_NONE = 0,
+       WAL_LEVEL_MINIMAL,
        WAL_LEVEL_REPLICA,
        WAL_LEVEL_LOGICAL
 } WalLevel;

I'm a bit concerned about if we can change the values of existing symbols, 
because wal_level setting is stored in pg_control file.  Having a quick look at 
source code, there seems to be no problem.  However, I'm not sure about 
pg_upgrade.  Can you try, for example,

1. Create the database cluster with an older version, say, PG 13.
2. Start and stop the database server.
3. Run pg_controldata and see that it prints replica for the wal_level value.
4. Upgrade the database cluster with pg_upgrade.
5. Run pg_controldata and see the wal_level value.

If you change the values of existing symbols like your patch, you need to bump 
PG_CONTROL_VERSION.
If you find we can't tchange the existing values, you can probably set 
WAL_LEVEL_NONE to -1.


Regards
Takayuki Tsunakawa



Reply via email to