Re: Additional Chapter for Tutorial

Justin Pryzby Sat, 31 Oct 2020 14:35:38 -0700

On Fri, Oct 30, 2020 at 05:45:00PM +0100, Erik Rijkers wrote:
> On 2020-10-30 11:57, Jürgen Purtz wrote:
> > On 26.10.20 15:53, David G. Johnston wrote:
> > > Removing -docs as moderation won’t let me cross-post.
> > > 
> 
> Hi,
> 
> I applied 0009-architecture-vs-master.patch to head
> and went through architecture.sgml (only that file),
> then produced the attached .diff


Now I applied 0009 as well as Erik's changes and made some more of my own :)

I'm including all patches so CFBOT is happy.

> 3.
> 'accesses' seems a somewhat strange word most of the time just 'access' may
> be better.  Not sure - native speaker wanted. (no changes made)

You're right, and I included that part.

-- 
Justin

>From 25cbdab1a1266861c23062501f7e2b9efd2675e3 Mon Sep 17 00:00:00 2001
From: Erik Rijkers <e...@xs4all.nl>
Date: Fri, 30 Oct 2020 17:45:00 +0100
Subject: [PATCH 1/3] Additional Chapter for Tutorial

---
 doc/src/sgml/architecture.sgml | 142 +++++++++++++++------------------
 1 file changed, 66 insertions(+), 76 deletions(-)

diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml
index e547a87d08..ffdac61975 100644
--- a/doc/src/sgml/architecture.sgml
+++ b/doc/src/sgml/architecture.sgml
@@ -19,19 +19,18 @@
     In the case of <productname>PostgreSQL</productname>, the server
     launches a single process for each client connection, referred to as a
     <glossterm linkend="glossary-backend">Backend</glossterm> process.
-    Those Backend processes handle the client's requests by acting on the
+    Such a Backend process handles the client's requests by acting on the
     <glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>.
     This leads to other activities (file access, WAL, vacuum, ...) of the
     <glossterm linkend="glossary-instance">Instance</glossterm>. The
     Instance is a group of server-side processes acting on a common
-    Shared Memory. Notably, PostgreSQL does not utilize application
-    threading within its implementation.
+    Shared Memory. PostgreSQL does not utilize threading.
    </para>
 
    <para>
-    The first step in an Instance start is the start of the
+    The first step when an Instance starts is the start of the
     <glossterm linkend="glossary-postmaster">Postmaster</glossterm>.
-    He loads the configuration files, allocates Shared Memory, and
+    It loads the configuration files, allocates Shared Memory, and
     starts the other processes of the Instance:
     <glossterm linkend="glossary-background-writer">Background Writer</glossterm>,
     <glossterm linkend="glossary-checkpointer">Checkpointer</glossterm>,
@@ -66,32 +65,32 @@
    <para>
     When a client application tries to connect to a
     <glossterm linkend="glossary-database">database</glossterm>,
-    this request is handled initially by the Postmaster. He
+    this request is handled initially by the Postmaster. It
     starts a new Backend process and instructs the client
     application to connect to it. All further client requests
-    go to this process and are handled by it.
+    are handled by this process.
    </para>
 
    <para>
     Client requests like <command>SELECT</command> or
     <command>UPDATE</command> usually lead to the
-    necessity to read or write some data. This is carried out
+    necessity to read or write data. This is carried out
     by the client's backend process. Reads involve a page-level
-    cache housed in Shared Memory (for details see:
+    cache, located in Shared Memory (for details see:
     <xref linkend="sysvipc"/>) for the benefit of all processes
-    in the instance. Writes also involve this cache, in additional
+    in the instance. Writes also use this cache, in addition
     to a journal, called a write-ahead-log or WAL.
    </para>
 
    <para>
-    Shared Memory is limited in size. Thus, it becomes necessary
+    Shared Memory is limited in size and it can become necessary
     to evict pages. As long as the content of such pages hasn't
     changed, this is not a problem. But in Shared Memory also
     write actions take place. Modified pages are called dirty
     pages or dirty buffers and before they can be evicted they
-    must be written back to disk. This happens regularly by the
+    must be written to disk. This happens regularly by the
     Background Writer and the Checkpointer process to ensure
-    that the disk version of the pages are kept up-to-date.
+    that the disk version of the pages are up-to-date.
     The synchronisation from RAM to disk consists of two steps.
    </para>
 
@@ -109,7 +108,7 @@
     Shared Memory. The parallel running WAL Writer process
     reads them and appends them to the end of the current
     <glossterm linkend="glossary-wal-record">WAL file</glossterm>.
-    Such sequential writes are much faster than writes to random
+    Such sequential writes are faster than writes to random
     positions of heap and index files. All WAL records created
     out of one dirty page must be transferred to disk before the
     dirty page itself can be transferred to disk in the second step.
@@ -119,19 +118,19 @@
     Second, the transfer of dirty buffers from Shared Memory to
     files must take place. This is the primary task of the
     Background Writer process. Because I/O activities can block
-    other processes significantly, it starts periodically and
+    other processes, it starts periodically and
     acts only for a short period. Doing so, its extensive (and
     expensive) I/O activities are spread over time, avoiding
-    debilitating I/O peaks. Also, the Checkpointer process
-    transfers dirty buffers to file.
+    debilitating I/O peaks. The Checkpointer process
+    also transfers dirty buffers to file.
    </para>
 
    <para>
-    The Checkpointer creates
+    The Checkpointer process creates
     <glossterm linkend="glossary-checkpoint">Checkpoints</glossterm>.
     A Checkpoint is a point in time when all older dirty buffers,
     all older WAL records, and finally a special Checkpoint record
-    have been written and flushed to disk. Heap and index files
+    are written and flushed to disk. Heap and index files
     on the one hand and WAL files on the other hand are in sync.
     Previous WAL is no longer required. In other words,
     a possibly occurring recovery, which integrates the delta
@@ -141,13 +140,13 @@
    </para>
 
    <para>
-    While the Checkpointer ensures that a running system can crash
+    While the Checkpointer ensures that the database system can crash
     and restart itself in a valid state, the administrator needs
     to handle the case where the heap and files themselves become
     corrupted (and possibly the locally written WAL, though that is
     less common). The options and details are covered extensively
     in the backup and restore section (<xref linkend="backup"/>).
-    For our purposes here, note just that the WAL Archiver process
+    For our purposes here, just note that the WAL Archiver process
     can be enabled and configured to run a script on filled WAL
     files &mdash; usually to copy them to a remote location.
    </para>
@@ -234,13 +233,13 @@
    <para>
     Every database must contain at least one schema because all
     <glossterm linkend="glossary-sql-object">SQL Objects</glossterm>
-    are contained in a schema.
-    Schemas are namespaces for their SQL objects and ensure
-    (with one exception) that within their scope names are used
-    only once across all types of SQL objects. E.g., it is not possible
+    must be contained in a schema.
+    Schemas are namespaces for SQL objects and ensure
+    (with one exception) that the SQL object names are used only once within
+    their scope across all types of SQL objects. E.g., it is not possible
     to have a table <literal>employee</literal> and a view
     <literal>employee</literal> within the same schema. But it is
-    possible to have two tables <literal>employee</literal> in
+    possible to have two tables <literal>employee</literal> in two
     different schemas. In this case, the two tables
     are separate objects and independent of each
     other. The only exception to this cross-type uniqueness is that
@@ -273,7 +272,7 @@
     <firstterm>Global SQL Objects</firstterm>, are outside of the
     strict hierarchy: All <firstterm>database names</firstterm>,
     all <firstterm>tablespace names</firstterm>, and all
-    <firstterm>role names</firstterm> are automatically known and
+    <firstterm>role names</firstterm> are automatically
     available throughout the cluster, independent from
     the database or schema in which they where defined originally.
     <xref linkend="tutorial-internal-objects-hierarchy-figure"/>
@@ -302,7 +301,7 @@
    <title>The physical Perspective: Directories and Files</title>
 
    <para>
-    <productname>PostgreSQL</productname> organizes long-lasting
+    <productname>PostgreSQL</productname> organizes long-lasting (persistent)
     data as well as volatile state information about transactions
     or replication actions in the file system. Every
     <xref linkend="glossary-db-cluster"/> has its root directory
@@ -352,20 +351,19 @@
     every table and every index to store heap and index
     data. Those files are accompanied by files for the
     <link linkend="storage-fsm">Free Space Maps</link>
-    (extension <literal>_fsm</literal>) and
+    (suffixed <literal>_fsm</literal>) and
     <link linkend="storage-vm">Visibility Maps</link>
-    (extension <literal>_vm</literal>), which contain optimization information.
+    (suffixed <literal>_vm</literal>), which contain optimization information.
    </para>
 
    <para>
-    Another subdirectory is <literal>global</literal>.
-    In analogy to the database-specific
-    subdirectories, there are files containing information about
+    Another subdirectory is <literal>global</literal> which 
+    contains files with information about
     <glossterm linkend="glossary-sql-object">Global SQL Objects</glossterm>.
     One type of such Global SQL Objects are
     <glossterm linkend="glossary-tablespace">tablespaces</glossterm>.
     In <literal>global</literal> there is information about
-    the tablespaces, not the tablespaces themselves.
+    the tablespaces; not the tablespaces themselves.
    </para>
 
    <para>
@@ -392,11 +390,11 @@
    <para>
     In the root directory <literal>data</literal>
     there are also some files. In many cases, the configuration
-    files of the cluster are stored here. As long as the
+    files of the cluster are stored here. If the
     instance is up and running, the file
     <literal>postmaster.pid</literal> exists here
     and contains the process ID (pid) of the
-    Postmaster which has started the instance.
+    Postmaster which started the instance.
    </para>
 
    <para>
@@ -411,7 +409,7 @@
 
    <para>
     In most cases, <productname>PostgreSQL</productname> databases
-    support many clients at the same time. Therefore, it is necessary to
+    support many clients at the same time which makes it necessary to
     protect concurrently running requests from unwanted overwriting
     of other's data as well as from reading inconsistent data. Imagine an
     online shop offering the last copy of an article. Two clients have the
@@ -432,11 +430,11 @@
     <productname>PostgreSQL</productname> implements a third, more
     sophisticated technique: <firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC
-    over other technologies gets evident in multiuser OLTP
+    over other technologies becomes evident in multiuser OLTP
     environments with a massive number of concurrent write
     actions. There, MVCC generally performs better than solutions
     using locks. In a <productname>PostgreSQL</productname>
-    database reading never blocks writing and writing never
+    database, reading never blocks writing and writing never
     blocks reading, even in the strictest level of transaction
     isolation.
    </para>
@@ -444,14 +442,14 @@
    <para>
     Instead of locking rows, the <firstterm>MVCC</firstterm> technique creates
     a new version of the row when a data-change takes place. To
-    distinguish between these two versions and to track the timeline
+    distinguish between these two versions, and to track the timeline
     of the row, each of the versions contains, in addition to their user-defined
     columns, two special system columns, which are not visible
     for the usual <command>SELECT * FROM ...</command> command.
     The column <literal>xmin</literal> contains the transaction ID (xid)
-    of the transaction, which created this version of the row. Accordingly,
-    <literal>xmax</literal> contains the xid of the transaction, which has
-    deleted this version, or zero, if the version is not
+    of the transaction which created this version of the row.
+    <literal>xmax</literal> contains the xid of the transaction which has
+    deleted this version, or zero if the version is not
     deleted. You can read both with the command
     <command>SELECT xmin, xmax, * FROM ... </command>.
    </para>
@@ -469,7 +467,7 @@
    </para>
 
    <para>
-    The description in this chapter simplifies by omitting some details.
+    The description in this chapter simplifies by omitting details.
     When many transactions are running simultaneously, things can
     get complicated. Sometimes transactions get aborted via
     <command>ROLLBACK</command> immediately or after a lot of other activities, sometimes
@@ -526,8 +524,8 @@
     creates a new version of the row with its xid in
     <literal>xmin</literal>, <literal>0</literal> in
     <literal>xmax</literal>, and <literal>'y'</literal> in the
-    user data (plus all the other user data from the old version).
-    This version is now valid for all coming transactions.
+    user data (plus all other user data from the old version).
+    This version is now valid for all future transactions.
    </para>
 
    <para>
@@ -624,9 +622,9 @@
     <para>
      Autovacuum runs automatically by
      default. Its default parameters as well as such for
-     <command>VACUUM</command> fit well for most standard
+     <command>VACUUM</command> are appropriate for most standard
      situations. Therefore a novice database manager can
-     easily skip the rest of this chapter which explains
+     skip the rest of this chapter which explains
      a lot of details.
     </para>
    </note>
@@ -687,7 +685,7 @@
 
    <para>
     The eagerness &mdash; you can call it 'aggression' &mdash; of the
-    operations <emphasis>eliminating bloat</emphasis> and
+    operations for <emphasis>eliminating bloat</emphasis> and
     <emphasis>freeze</emphasis> is controlled by configuration
     parameters, runtime flags, and in extreme situations by
     the processes themselves. Because vacuum operations typically are I/O
@@ -783,7 +781,7 @@
        When a client issues the SQL command <command>VACUUM</command>
        with the option <command>FULL</command>.
        Also, in this mode, the bloat disappears, but the strategy used
-       is very different: In this case, the complete table is copied
+       is very different: in this case, the complete table is copied
        to a different file skipping all outdated row versions. This
        leads to a significant reduction of used disk space because
        the new file contains only the actual data. The old file
@@ -1143,7 +1141,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     atomicity: either all or none of its operations succeed,
     regardless of the fact that it may consist of a lot of
     different write-operations, and each such operation may
-    affect thousands or millions of rows. As soon as one of the
+    affect many rows. As soon as one of the
     operations fails, all previous operations fail also, which
     means that all modified rows retain their values as of the
     beginning of the transaction.
@@ -1157,14 +1155,14 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     &mdash; even in the lowest
     <link linkend="transaction-iso">isolation level</link>
     of transactions. <productname>PostgreSQL</productname>
-    does never show uncommitted changes to other connections.
+    never shows uncommitted changes to other connections.
    </para>
 
    <para>
     The situation regarding visibility is somewhat different
     from the point of view of the modifying transaction.
-    <command>SELECT</command> commands issued inside a
-    transaction delivers all changes done so far by this
+    A <command>SELECT</command> command issued inside a
+    transaction shows all changes done so far by this
     transaction.
    </para>
 
@@ -1231,7 +1229,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
    <para>
     Transactions ensure that the
     <glossterm linkend="glossary-consistency">consistency</glossterm>
-    of the complete database always keeps valid. Declarative
+    of the complete database always remains valid. Declarative
     rules like
     <link linkend="ddl-constraints-primary-keys">primary</link>- or
     <link linkend="ddl-constraints-fk">foreign keys</link>,
@@ -1241,13 +1239,6 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     are part of the all-or-nothing nature of transactions.
    </para>
 
-   <para>
-    Also, all self-evident &mdash; but possibly not obvious
-    &mdash; low-level demands on the database system are
-    ensured; e.g. index entries for rows must become
-    visible at the same moment as the rows themselves.
-   </para>
-
    <para>
     There is the additional feature
     '<link linkend="transaction-iso">isolation level</link>',
@@ -1287,7 +1278,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     a severe software error like a null pointer exception.
     Because <productname>PostgreSQL</productname> uses a
     client/server architecture, no direct problem for the
-    database will occur. In all of this cases, the
+    database will occur. In all of these cases, the
     <glossterm linkend="glossary-backend">Backend process</glossterm>,
     which is the client's counterpart at the server-side,
     may recognize that the network connection is no longer
@@ -1310,7 +1301,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     automatically recognizes that the last shutdown of the
     instance did not happen as expected: files might not be
     closed properly and the <literal>postmaster.pid</literal>
-    file exists. <productname>PostgreSQL</productname>
+    file unexpectedly exists. <productname>PostgreSQL</productname>
     tries to clean up the situation. This is possible because
     all changes in the database are stored twice. First,
     the WAL files contain them as a chronology of
@@ -1328,8 +1319,8 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     <glossterm linkend="glossary-checkpoint">checkpoint</glossterm>.
     This checkpoint signals that the database files are in
     a consistent state, especially that all WAL records up to
-    this point were successfully stored in heap and index. Starting
-    here, the recovery process copies the following WAL records
+    this point were successfully stored in heap and index files. Starting
+    here, the recovery process copies the remaining WAL records
     to heap and index. As a result, the files contain all
     changes and reach a consistent state. Changes of committed
     transactions are visible; those of uncommited transactions
@@ -1344,10 +1335,10 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
    <bridgehead renderas="sect3">Disk crash</bridgehead>
    <para>
     If a disk crashes, the course of action described previously
-    cannot work. It is likely that the WAL files and/or the
+    cannot work: it is likely that the WAL files and/or the
     data and index files are no longer available. The
     database administrator must take special actions to
-    overcome such situations.
+    prepare for such a situation.
    </para>
    <para>
     He obviously needs a backup. How to take such a backup
@@ -1427,7 +1418,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
    </para>
    <para>
     The obvious disadvantage of this method is that there
-    is a downtime where no user interaction is possible.
+    is a downtime.
     The other two strategies run during regular operating
     times.
    </para>
@@ -1456,14 +1447,14 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
    <bridgehead renderas="sect2">Continuous archiving based on pg_basebackup and WAL files</bridgehead>
    <para>
     <link linkend="continuous-archiving">This method</link>
-    is the most sophisticated and complex one. It
+    is the most sophisticated and most complex one. It
     consists of two phases.
    </para>
    <para>
-    First, you need to create a so called
+    First, you need to create a so-called
     <firstterm>basebackup</firstterm> with the tool
     <command>pg_basebackup</command>. The result is a
-    directory structure plus files which contains a
+    directory structure plus files which contain a
     consistent copy of the original cluster.
     <command>pg_basebackup</command> runs in
     parallel to other processes in its own transaction.
@@ -1484,7 +1475,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     <glossterm linkend="glossary-wal-archiver">Archiver process</glossterm>
     will automatically copy every single WAL file to a save location.
     <link linkend="backup-archiving-wal">Its configuration</link>
-    consists mainly of a string, which contains a copy command
+    consists mainly of a string that contains a copy command
     in the operating system's syntax. In order to protect your
     data against a disk crash, the destination location
     of a basebackup as well as of the
@@ -1492,9 +1483,8 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     disk which is different from the data disk.
    </para>
    <para>
-    If it gets necessary to restore the cluster, you have to
-    copy the basebackup and the
-    archived WAL files to
+    If it becomes necessary to restore the cluster, you have to
+    copy the basebackup and the archived WAL files to
     their original directories. The configuration of this
     <link linkend="backup-pitr-recovery">recovery procedure</link>
     contains a string with the reverse copy command: from
-- 
2.17.0

>From 7712793cd1be3ff17b3f32668f08101fbbaa6eda Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 31 Oct 2020 15:48:00 -0500
Subject: [PATCH 2/3] Fix "accesses" per suggestion from Erik

---
 doc/src/sgml/architecture.sgml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml
index ffdac61975..6f819220dc 100644
--- a/doc/src/sgml/architecture.sgml
+++ b/doc/src/sgml/architecture.sgml
@@ -158,7 +158,7 @@
 -->
 
    <para>
-    The Statistics Collector collects counters about accesses to
+    The Statistics Collector collects counters about access to
     SQL objects like tables, rows, indexes, pages, and more. It
     stores the obtained information in system tables.
    </para>
@@ -423,7 +423,7 @@
 
    <para>
     A first approach to implement protections against concurrent
-    accesses to the same data may be the locking of critical
+    access to the same data may be the locking of critical
     rows. Two such techniques are:
     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
     and <emphasis>Two Phase Locking</emphasis> (2PL).
@@ -479,7 +479,7 @@
    </para>
 
    <para>
-    So, what's going on in detail when write accesses take place?
+    So, what's going on in detail when write access takes place?
     <xref linkend="tutorial-mvcc-figure"/> shows details concerning
     <literal>xmin</literal>, <literal>xmax</literal>, and user data.
    </para>
@@ -1059,7 +1059,7 @@
     The setting of the flags is silently done by <command>VACUUM</command>
     and Autovacuum during their bloat and freeze operations.
     This is done to speed up future vacuum actions,
-    regular accesses to heap pages, and some accesses to
+    regular access to heap pages, and some access to
     the index. Every data-modifying operation on any row
     version of the page clears the flags.
    </para>
-- 
2.17.0

>From fec3a8f5722b41a15327a4c0eb18f68ba07a51fd Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 31 Oct 2020 15:44:22 -0500
Subject: [PATCH 3/3] More fixes on top

---
 doc/src/sgml/architecture.sgml | 239 ++++++++++++++++-----------------
 1 file changed, 114 insertions(+), 125 deletions(-)

diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml
index 6f819220dc..f3acdaa6b3 100644
--- a/doc/src/sgml/architecture.sgml
+++ b/doc/src/sgml/architecture.sgml
@@ -66,9 +66,7 @@
     When a client application tries to connect to a
     <glossterm linkend="glossary-database">database</glossterm>,
     this request is handled initially by the Postmaster. It
-    starts a new Backend process and instructs the client
-    application to connect to it. All further client requests
-    are handled by this process.
+    starts a new Backend process to service the client's requests.
    </para>
 
    <para>
@@ -89,7 +87,7 @@
     write actions take place. Modified pages are called dirty
     pages or dirty buffers and before they can be evicted they
     must be written to disk. This happens regularly by the
-    Background Writer and the Checkpointer process to ensure
+    Checkpointer and Background Writer processes to ensure
     that the disk version of the pages are up-to-date.
     The synchronisation from RAM to disk consists of two steps.
    </para>
@@ -117,7 +115,7 @@
    <para>
     Second, the transfer of dirty buffers from Shared Memory to
     files must take place. This is the primary task of the
-    Background Writer process. Because I/O activities can block
+    Checkpointer process. Because I/O activities can block
     other processes, it starts periodically and
     acts only for a short period. Doing so, its extensive (and
     expensive) I/O activities are spread over time, avoiding
@@ -136,7 +134,9 @@
     a possibly occurring recovery, which integrates the delta
     information of WAL into heap and index files, will happen
     by replaying only WAL past the last recorded checkpoint
-    on top of the current heap and files. This speeds up recovery.
+    on top of the current heap and files.
+    This limits the amount of WAL which needs to be replayed during recovery in
+    the event of a crash.
    </para>
 
    <para>
@@ -369,8 +369,8 @@
    <para>
     The subdirectory <literal>pg_wal</literal> contains the
     <glossterm linkend="glossary-wal-file">WAL files</glossterm>.
-    They arise and grow parallel to data changes in the
-    cluster and remain alive as long as
+    They arise and grow in parallel with data changes in the
+    cluster and remain as long as
     they are required for recovery, archiving, or replication.
    </para>
 
@@ -383,8 +383,8 @@
 
    <para>
     In <literal>pg_tblspc</literal>, there are symbolic links
-    that point to directories containing such SQL objects
-    that are created within tablespaces.
+    that point to directories containing SQL objects
+    that exist within a non-default tablespace.
    </para>
 
    <para>
@@ -459,7 +459,7 @@
     sequences. Every new transaction receives the next number as its ID.
     Therefore, this flow of xids represents the flow of transaction
     start events over time. But keep in mind that xids are independent of
-    any time measurement &mdash; in milliseconds or whatever. If you dive
+    any time measurement &mdash; in milliseconds or otherwise. If you dive
     deeper into <productname>PostgreSQL</productname>, you will recognize
     parameters with names such as 'xxx_age'. Despite their names,
     these '_age' parameters do not specify a period of time but represent
@@ -514,38 +514,36 @@
     executes an <command>UPDATE</command> of this row by
     changing the user data from <literal>'x'</literal> to
     <literal>'y'</literal>. According to the MVCC principles,
-    the data in the old version of the row does not change!
-    The value <literal>'x'</literal> remains as it was before.
-    Only <literal>xmax</literal> changes to <literal>135</literal>.
-    Now, this version is treated as valid exclusively for
+    the old version of the row is not changed!
+    Internally, an <command>UPDATE</command> command acts
+    as a <command>DELETE</command> command followed by
+    an <command>INSERT</command> command.
+    <literal>xmax</literal> of the old row version is changed to <literal>135</literal>,
+    and a new row version is added with
+    <literal>xmin</literal>=135,
+    <literal>xmax</literal>=0, and <literal>'y'</literal> in the
+    user data (plus any other user columns from the old version).
+    The old row version is visible only to
     transactions with xids from <literal>123</literal> to
-    <literal>134</literal>. As a substitute for the non-occurring
-    data change in the old version, the <command>UPDATE</command>
-    creates a new version of the row with its xid in
-    <literal>xmin</literal>, <literal>0</literal> in
-    <literal>xmax</literal>, and <literal>'y'</literal> in the
-    user data (plus all other user data from the old version).
-    This version is now valid for all future transactions.
+    <literal>134</literal>, and the new row version
+    is visible to all future transactions.
    </para>
 
    <para>
     All subsequent <command>UPDATE</command> commands behave
-    in the same way as the first one: they put their xid to
+    in the same way as the first one: they put their xid in
     <literal>xmax</literal> of the current version, create
-    the next version with their xid in <literal>xmin</literal>,
-    <literal>0</literal> in <literal>xmax</literal>, and the
-    new user data.
+    a new version with their xid in <literal>xmin</literal> and
+    <literal>0</literal> in <literal>xmax</literal>.
    </para>
 
    <para>
     Finally, a row may be deleted by a <command>DELETE</command>
     command. Even in this case, all versions of the row remain as
-    before. Nothing is thrown away so far! Only <literal>xmax</literal>
-    of the last version changes to the xid of the <command>DELETE</command>
-    transaction, which indicates that it is only valid for
-    transactions with xids older than its own (from
-    <literal>142</literal> to <literal>820</literal> in this
-    example).
+    before. Nothing is thrown away! Only <literal>xmax</literal>
+    of the last version is set to the xid of the <command>DELETE</command>
+    transaction, which indicates that (if committed) it is only visible to
+    transactions with xids older than that.
    </para>
 
    <para>
@@ -553,10 +551,10 @@
     of the same row in the table's heap file and leaves them there,
     even after a <command>DELETE</command> command. Only the youngest
     version is relevant for all future transactions. But the
-    system must also preserve some of the older ones for a
-    certain amount of time because the possibility exists that
-    they are or could become relevant for any pending
-    transactions. Over time, also the older ones get out of scope
+    system must also preserve some of the older ones for
+    awhile, because they could still be needed by
+    transactions which started before the deleting transaction commits.
+    Over time, also the older ones get out of scope
     for ALL transactions and therefore become unnecessary.
     Nevertheless, they do exist physically on the disk and occupy
    space.
@@ -571,26 +569,18 @@
      <simpara>
       <literal>xmin</literal> and <literal>xmax</literal>
       indicate the range from where to where
-      row versions are valid (visible) for transactions.
+      <firstterm>row versions</firstterm> are valid (visible) for transactions.
       This range doesn't imply any direct temporal meaning;
       the sequence of xids reflects only the sequence of
       transaction begin events. As
       xids grow, old row versions get out of scope over time.
-      If an old row version is no longer valid for ALL existing
-      transactions, it's called <firstterm>dead</firstterm>. The
-      space occupied by dead row versions is part of the
+      If an old row version is no longer relevant for ANY existing
+      transactions, it can be marked <firstterm>dead</firstterm>. The
+      space occupied by dead row versions is wasted space called
       <glossterm linkend="glossary-bloat">bloat</glossterm>.
      </simpara>
     </listitem>
 
-    <listitem>
-     <simpara>
-      Internally, an <command>UPDATE</command> command acts in the
-      same way as a <command>DELETE</command> command followed by
-      an <command>INSERT</command> command.
-     </simpara>
-    </listitem>
-
     <listitem>
      <simpara>
       Nothing gets wiped away &mdash; with the consequence that the database
@@ -610,12 +600,12 @@
 
    <para>
     As we have seen in the previous chapter, the database
-    tends to occupy more and more disk space, the
+    tends to occupy more and more disk space, caused by
     <glossterm linkend="glossary-bloat">bloat</glossterm>.
     This chapter explains how the SQL command
     <command>VACUUM</command> and the automatically running
     <firstterm>Autovacuum</firstterm> processes clean up
-    by eliminating bloat.
+    and avoid continued growth.
    </para>
 
    <note>
@@ -635,8 +625,8 @@
     special situations, or they start it in batch jobs which run
     periodically. Autovacuum processes run as part of the
     <link linkend="glossary-instance">Instance</link> at the server.
-    There is a constantly running Autovacuum daemon. It permanently
-    controls the state of all databases based on values that are collected by the
+    There is a constantly running Autovacuum daemon. It continuously
+    monitors the state of all databases based on values that are collected by the
     <link linkend="glossary-stats-collector">Statistics Collector</link>
     and starts Autovacuum processes whenever it detects
     certain situations. Thus, it's a dynamic behavior of
@@ -657,7 +647,7 @@
 
     <listitem>
      <simpara>
-      <firstterm>Freeze</firstterm>: Mark the youngest row version
+      <firstterm>Freeze</firstterm>: Mark old row version
       as frozen. This means that the version
       is always treated as valid (visible) independent from
       the <firstterm>wraparound problem</firstterm> (see below).
@@ -684,22 +674,22 @@
    </itemizedlist>
 
    <para>
-    The eagerness &mdash; you can call it 'aggression' &mdash; of the
+    The eagerness &mdash; you can call it 'aggressiveness' &mdash; of the
     operations for <emphasis>eliminating bloat</emphasis> and
     <emphasis>freeze</emphasis> is controlled by configuration
     parameters, runtime flags, and in extreme situations by
     the processes themselves. Because vacuum operations typically are I/O
     intensive, which can hinder other activities, Autovacuum
     avoids performing many vacuum operations in bulk. Instead,
-    it carries out many small actions with time gaps in between.
-    The SQL command <command>VACUUM</command> runs immediately
-    and without any time gaps.
+    it carries out many small actions with delay points in between.
+    When invoked manually, the SQL command <command>VACUUM</command>
+    runs immediately and (by default) without any time delay.
    </para>
 
    <bridgehead renderas="sect2">Eliminate Bloat</bridgehead>
 
    <para>
-    To determine which of the row versions are superfluous, the
+    To determine which of the row versions are no longer needed, the
     elimination operation must evaluate <literal>xmax</literal>
     against several criteria which all must apply:
    </para>
@@ -740,14 +730,13 @@
    </itemizedlist>
 
    <para>
-    After the vacuum operation detects a superfluous row version, it
+    After the vacuum operation detects an unused row version, it
     marks its space as free for future use of writing actions. Only
     in rare situations (or in the case of <command>VACUUM FULL</command>),
-    this space is released to the operating system. In most cases,
+    is this space released to the operating system. In most cases,
     it remains occupied by <productname>PostgreSQL</productname>
     and will be used by future <command>INSERT</command> or
-    <command>UPDATE</command> commands concerning this row or a
-    completely different one.
+    <command>UPDATE</command> commands to this table.
    </para>
 
    <para>
@@ -761,9 +750,9 @@
        in its default format, i.e., without any option. To boost performance,
        in this and the next case <command>VACUUM</command> does not
        read and act on all pages of the heap.
-       The Visibility Map, which is very compact and therefore has a small
-       size, contains information about pages, where bloat-candidates might
-       be found. Only such pages are processed.
+       The Visibility Map, which is very compact and therefore fast to read,
+       contains information about which pages have no deleted row versions, and
+       can be skipped by vacuum.
       </simpara>
      </listitem>
 
@@ -771,7 +760,7 @@
       <simpara>
        When a client issues the SQL command <command>VACUUM</command>
        with the option <command>FREEZE</command>. (In this case,
-       it undertakes much more actions, see
+       it undertakes many more actions, see
        <link linkend="tutorial-freeze">Freeze Row Versions</link>.)
       </simpara>
      </listitem>
@@ -780,12 +769,11 @@
       <simpara>
        When a client issues the SQL command <command>VACUUM</command>
        with the option <command>FULL</command>.
-       Also, in this mode, the bloat disappears, but the strategy used
-       is very different: in this case, the complete table is copied
-       to a different file skipping all outdated row versions. This
-       leads to a significant reduction of used disk space because
-       the new file contains only the actual data. The old file
-       is deleted.
+       In this mode, an exclusive lock is taken, and
+       the whole table is copied to a different file, skipping all outdated row
+       versions.  All bloat is thereby eliminated, which
+       may lead to a significant reduction of used disk space.
+       The old file is deleted.
       </simpara>
      </listitem>
 
@@ -807,17 +795,17 @@
    <para>
     This logic only applies to row versions of the heap. Index entries
     don't use <literal>xmin/xmax</literal>. Nevertheless, such index
-    entries, which would lead to outdated row versions, are released
+    entries, which would lead to outdated row versions, are cleaned up
     accordingly.
    </para>
 
    <para>
     The above descriptions omit the fact that xids on a real computer
-    have a limited size. They count up in the same way as sequences, and after
-    a certain number of new transactions they are forced to restart
+    have a limited size, and after
+    a certain number of transactions they are forced to restart
     from the beginning, which is called <firstterm>wraparound</firstterm>.
     Therefore the terms 'old transaction' / 'young transaction' does
-    not always correlate with low / high values of xids. Near to the
+    not always correlate with low / high values of xids. Near the
     wraparound point, there are cases where <literal>xmin</literal> has
     a higher value than <literal>xmax</literal>, although their meaning
     is said to be older than <literal>xmax</literal>.
@@ -856,7 +844,7 @@
     and the corresponding transactions of <literal>xmin</literal>
     and <literal>xmax</literal> must be committed. However,
     <productname>PostgreSQL</productname> has to consider the
-    possibility of wraparounds.
+    possibility of wraparound.
     Therefore the decision becomes more complex. The general
     idea of the solution is to use the 'between
     <literal>xmin</literal> and <literal>xmax</literal>'
@@ -883,7 +871,7 @@
     <listitem>
      <simpara>
       With each newly created transaction the two split-points
-      move forward. When 'txid_current + 2^31' would reach a
+      move forward. If 'txid_current + 2^31' reached a
       row version with <literal>xmin</literal> equal to that value, it would
       immediately jump from 'past' to 'future' and would be
       no longer visible!
@@ -892,11 +880,11 @@
 
     <listitem>
      <simpara>
-      To avoid this unacceptable extinction of data, the vacuum
-      operation <firstterm>freeze</firstterm> clears the situation
-      long before the split-point is reached. It sets a flag
-      in the header of the row version, which completely eliminates
-      the future use of <literal>xmin/xmax</literal> and indicates
+      If not handled in some way, data inserted many transactions ago would become invisibile.
+      The vacuum operation <firstterm>freeze</firstterm> avoids this
+      long before the split-point is reached by setting a flag
+      in the header of the row version which avoids
+      future comparison of its <literal>xmin/xmax</literal> and indicates
       that the version is valid not only in the 'past'-half
       but also in the 'future'-half as well as in all coming
       <glossterm linkend="glossary-xid">epochs</glossterm>.
@@ -943,19 +931,19 @@
        When a client issues the SQL command <command>VACUUM</command>
        with its <command>FREEZE</command> option. In this case, all
        pages are processed that are marked in the Visibility Map
-       to potentially have unfrozen rows.
+       as potentially having unfrozen rows.
       </simpara>
      </listitem>
      <listitem>
       <simpara>
        When a client issues the SQL command <command>VACUUM</command> without
-       any options but finds that there are xids older than
+       any options but there are xids older than
        <xref linkend="guc-vacuum-freeze-table-age"/>
        (default: 150 million) minus
        <xref linkend="guc-vacuum-freeze-min-age"/>
        (default: 50 million).
        As before, all pages are processed that are
-       marked in the Visibility Map to potentially have unfrozen
+       marked in the Visibility Map as potentially having unfrozen
        rows.
       </simpara>
      </listitem>
@@ -981,7 +969,7 @@
         <simpara>
          The process switches
          to an <emphasis>aggressive mode</emphasis> if it recognizes
-         that for the processed table their oldest xid exceeds
+         that for the processed table the oldest xid exceeds
          <xref linkend="guc-autovacuum-freeze-max-age"/>
          (default: 200 million). The value of the oldest unfrozen
          xid is stored per table in <literal>pg_class.relfrozenxid</literal>.
@@ -1037,22 +1025,21 @@
    <para>
     The <link linkend="glossary-vm">Visibility Map</link>
     (VM) contains two flags &mdash; stored as
-    two bits &mdash; for each page of the heap. If the first bit
-    is set, that indicates that the associated page does not
-    contain any bloat. If the second one is set, that indicates
-    that the page contains only frozen rows.
+    two bits &mdash; for each page of the heap. The first bit
+    indicates that the associated page does not
+    contain any bloat. The second bit indicates
+    that the page contains only frozen row versions.
    </para>
 
    <para>
      Please consider two details. First, in most cases a page
-     contains many rows, some of them in many versions.
+     contains many rows or row-versions.
      However, the flags are associated with the page,
-     not with a row or a row version. The flags are set
+     not with an individual row version. The flags are set
      only under the condition that they are valid for ALL
      row versions of the page. Second, since there
      are only two bits per page, the VM is considerably
-     smaller than the heap. Therefore it is buffered
-     in RAM in almost all cases.
+     smaller than the heap.
    </para>
 
    <para>
@@ -1068,7 +1055,7 @@
     The <link linkend="glossary-fsm">Free Space Map</link>
     (FSM) tracks the amount of free space per page. It is
     organized as a highly condensed b-tree of (rounded) sizes.
-    As long as <command>VACUUM</command> or Autovacuum change
+    Whenever <command>VACUUM</command> or Autovacuum changes
     the free space on any processed page, they log the new
     values in the FSM in the same way as all other writing
     processes.
@@ -1077,18 +1064,19 @@
    <bridgehead renderas="sect2">Statistics</bridgehead>
 
    <para>
-    Statistic information helps the <link
+    Statistical information helps the <link
     linkend="planner-stats">Query Planner</link> to make optimal
     decisions for the generation of execution plans. This
     information can be gathered with the SQL commands
     <command>ANALYZE</command> or <command>VACUUM ANALYZE</command>.
-    But also Autovacuum processes gather
+    But Autovacuum processes also gather
     such information. Depending on the percentage of changed rows
-    per table <xref linkend="guc-autovacuum-analyze-scale-factor"/>,
+    <xref linkend="guc-autovacuum-analyze-scale-factor"/>,
+    and minimum number of changed rows <xref linkend="guc-autovacuum-analyze-threshold"/>,
     the Autovacuum daemon starts Autovacuum processes to collect
-    statistics per table. This dynamic invocation of analyze
-    operations allows <productname>PostgreSQL</productname> to
-    adopt queries to changing circumstances.
+    statistics per table. The automatic analysis
+    allows <productname>PostgreSQL</productname> to
+    adapt query execution to changing circumstances.
    </para>
 
    <para>
@@ -1149,7 +1137,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
 
    <para>
     The atomicity also affects the visibility of changes. No
-    connection running simultaneously to a data modifying
+    connection running simultaneously with a data modifying
     transaction will ever see any change before the
     transaction successfully executes a <command>COMMIT</command>
     &mdash; even in the lowest
@@ -1228,9 +1216,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
 
    <para>
     Transactions ensure that the
-    <glossterm linkend="glossary-consistency">consistency</glossterm>
-    of the complete database always remains valid. Declarative
-    rules like
+    database always remains
+    <glossterm linkend="glossary-consistency">consistent</glossterm>.
+    Declarative rules like
     <link linkend="ddl-constraints-primary-keys">primary</link>- or
     <link linkend="ddl-constraints-fk">foreign keys</link>,
     <link linkend="ddl-constraints-check-constraints">checks</link>,
@@ -1248,11 +1236,12 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
    </para>
 
    <para>
-    Lastly, it is worth to notice that changes done by a
-    committed transaction will survive all future application,
-    instance, or hardware failures. The next chapter
-    explains this
-    <glossterm linkend="glossary-durability">durability</glossterm>.
+    Lastly, it is worth noticing that changes done by a
+    committed transaction will survive all failures in the application or
+    database cluster.
+    The next chapter explains the
+    <glossterm linkend="glossary-durability">durability</glossterm>
+    guarantees.
    </para>
   </sect1>
 
@@ -1309,7 +1298,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     which include the new data values and information about commit
     actions. The WAL records are written first. Second,
     the data itself shall exist in the heap and index files.
-    In opposite to the WAL records, this part may or may
+    In constrast with the WAL records, this part may or may
     not have been transferred entirely from Shared Memory
     to the files.
    </para>
@@ -1321,15 +1310,15 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     a consistent state, especially that all WAL records up to
     this point were successfully stored in heap and index files. Starting
     here, the recovery process copies the remaining WAL records
-    to heap and index. As a result, the files contain all
-    changes and reach a consistent state. Changes of committed
-    transactions are visible; those of uncommited transactions
+    to heap and index. As a result, the heap files contain all
+    changes recorded to the WAL and reach a consistent state. Changes of committed
+    transactions are visible; those of uncommitted transactions
     are also in the files, but - as usual - they are never seen
-    by any of the following transactions because uncommited
+    by any of the following transactions because uncommitted
     changes are never shown. Such recovery actions run
     completely automatically, it is not necessary that a
     database administrator configure or start anything by
-    himself.
+    themself.
    </para>
 
    <bridgehead renderas="sect3">Disk crash</bridgehead>
@@ -1341,7 +1330,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     prepare for such a situation.
    </para>
    <para>
-    He obviously needs a backup. How to take such a backup
+    They obviously needs a backup. How to take such a backup
     and use it as a starting point for a recovery of the
     cluster is explained in more detail in the next
     <link linkend="tutorial-backup">chapter</link>.
@@ -1353,11 +1342,11 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     and there is no room for additional data. In this case,
     <productname>PostgreSQL</productname> stops accepting
     data-modifying commands or even terminates completely.
-    No data loss or data corruption will occur.
+    Committed data is neither lost nor corrupted.
    </para>
    <para>
-    To come out of such a situation, the administrator should
-    remove unused files from this disk. But he should never
+    To recover from such a situation, the administrator should
+    remove unused files from this disk. But they should never
     delete files from the
     <glossterm linkend="glossary-data-directory">data directory</glossterm>.
     Nearly all of them are necessary for the consistency
@@ -1428,9 +1417,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     The tool <command>pg_dump</command> is able to take a
     <link linkend="backup-dump">copy</link>
     of the complete cluster or certain parts of it. It stores
-    the copy in the form of SQL <command>CREATE</command> and
-    <command>INSERT</command> commands. It runs in
-    parallel to other processes in its own transaction.
+    the copy in the form of SQL commands like <command>CREATE</command> and
+    <command>COPY</command>. It runs in
+    parallel with other processes in its own transaction.
    </para>
    <para>
     The output of <command>pg_dump</command> may be used as
@@ -1457,7 +1446,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
     directory structure plus files which contain a
     consistent copy of the original cluster.
     <command>pg_basebackup</command> runs in
-    parallel to other processes in its own transaction.
+    parallel with other processes in its own transaction.
    </para>
    <para>
     The second step is recommended but not necessary. All
-- 
2.17.0

Re: Additional Chapter for Tutorial

Reply via email to