Applied. ---------------------------------------------------------------------------
On Fri, Nov 30, 2012 at 10:43:29PM -0500, Bruce Momjian wrote: > On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote: > > > >> In any event, I think the documentation should caution that the > > > >> upgrade should not be deemed to be a success until after a system-wide > > > >> sync has been done. Even if we use the link rather than copy method, > > > >> are we sure that that is safe if the directories recording those links > > > >> have not been fsynced? > > > > > > > > OK, the above is something I have been thinking about, and obviously you > > > > have too. If you change fsync from off to on in a cluster, and restart > > > > it, there is no guarantee that the dirty pages you read from the kernel > > > > are actually on disk, because Postgres doesn't know they are dirty. > > > > They probably will be pushed to disk by the kernel in less than one > > > > minute, but still, it doesn't seem reliable. Should this be documented > > > > in the fsync section? > > > > > > > > Again, another reason not to use fsync=off, though your example of the > > > > file copy is a good one. As you stated, this is a problem with the file > > > > copy/link, independent of how Postgres handles the files. We can tell > > > > people to use 'sync' as root on Unix, but what about Windows? > > > > > > I'm pretty sure someone mentioned the way to do that on Windows in > > > this list in the last few months, but I can't seem to find it. I > > > thought it was the initdb fsync thread. > > > > Yep, the code is already in initdb to fsync a directory --- we just need > > a way for pg_upgrade to access it. > > I have developed the attached patch that does this. It basically adds > an --sync-only option to initdb, then turns off all durability in > pg_upgrade and has pg_upgrade run initdb --sync-only; this give us > another nice speedup! > > ------ SSD ---- -- magnetic --- > git patch git patch > 1 11.11 11.11 11.10 11.13 > 1000 20.57 19.89 20.72 19.30 > 2000 28.02 25.81 28.50 27.53 > 4000 42.00 43.59 46.71 46.84 > 8000 89.66 74.16 89.10 73.67 > 16000 157.66 135.98 159.97 153.48 > 32000 316.24 296.90 334.74 308.59 > 64000 814.97 715.53 797.34 727.94 > > (I am very happy with these times. Thanks to Jeff Janes for his > suggestions.) > > I have also added documentation to the 'fsync' configuration variable > warning about dirty buffers and recommending flushing them to disk > before the cluster is crash-recovery safe. > > I consider this patch ready for 9.3 application (meaning it is not a > prototype). > > -- > Bruce Momjian <br...@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c > new file mode 100644 > index c12f15b..63df529 > *** a/contrib/pg_upgrade/pg_upgrade.c > --- b/contrib/pg_upgrade/pg_upgrade.c > *************** main(int argc, char **argv) > *** 150,155 **** > --- 150,161 ---- > new_cluster.pgdata); > check_ok(); > > + prep_status("Sync data directory to disk"); > + exec_prog(UTILITY_LOG_FILE, NULL, true, > + "\"%s/initdb\" --sync-only \"%s\"", > new_cluster.bindir, > + new_cluster.pgdata); > + check_ok(); > + > create_script_for_cluster_analyze(&analyze_script_file_name); > create_script_for_old_cluster_deletion(&deletion_script_file_name); > > diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c > new file mode 100644 > index 49d4c8f..05d8cc0 > *** a/contrib/pg_upgrade/server.c > --- b/contrib/pg_upgrade/server.c > *************** start_postmaster(ClusterInfo *cluster) > *** 209,217 **** > * a gap of 2000000000 from the current xid counter, so autovacuum will > * not touch them. > * > ! * synchronous_commit=off improves object creation speed, and we > only > ! * modify the new cluster, so only use it there. If there is a > crash, > ! * the new cluster has to be recreated anyway. > */ > snprintf(cmd, sizeof(cmd), > "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p > %d%s%s%s%s\" start", > --- 209,217 ---- > * a gap of 2000000000 from the current xid counter, so autovacuum will > * not touch them. > * > ! * Turn off durability requirements to improve object creation speed, > and > ! * we only modify the new cluster, so only use it there. If there is a > ! * crash, the new cluster has to be recreated anyway. > */ > snprintf(cmd, sizeof(cmd), > "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p > %d%s%s%s%s\" start", > *************** start_postmaster(ClusterInfo *cluster) > *** 219,225 **** > (cluster->controldata.cat_ver >= > BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : > " -c autovacuum=off -c > autovacuum_freeze_max_age=2000000000", > ! (cluster == &new_cluster) ? " -c > synchronous_commit=off" : "", > cluster->pgopts ? cluster->pgopts : "", socket_string); > > /* > --- 219,226 ---- > (cluster->controldata.cat_ver >= > BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : > " -c autovacuum=off -c > autovacuum_freeze_max_age=2000000000", > ! (cluster == &new_cluster) ? > ! " -c synchronous_commit=off -c fsync=off -c > full_page_writes=off" : "", > cluster->pgopts ? cluster->pgopts : "", socket_string); > > /* > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > new file mode 100644 > index b56070b..b7df8ce > *** a/doc/src/sgml/config.sgml > --- b/doc/src/sgml/config.sgml > *************** include 'filename' > *** 1697,1702 **** > --- 1697,1711 ---- > </para> > > <para> > + For reliable recovery when changing <varname>fsync</varname> > + off to on, it is necessary to force all modified buffers in the > + kernel to durable storage. This can be done while the cluster > + is shutdown or while fsync is on by running <command>initdb > + --sync-only</command>, running <command>sync</>, unmounting the > + file system, or rebooting the server. > + </para> > + > + <para> > In many situations, turning off <xref > linkend="guc-synchronous-commit"> > for noncritical transactions can provide much of the potential > performance benefit of turning off <varname>fsync</varname>, without > diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml > new file mode 100644 > index 08ee37e..a1e46eb > *** a/doc/src/sgml/ref/initdb.sgml > --- b/doc/src/sgml/ref/initdb.sgml > *************** PostgreSQL documentation > *** 245,250 **** > --- 245,261 ---- > </varlistentry> > > <varlistentry> > + <term><option>-S</option></term> > + <term><option>--sync-only</option></term> > + <listitem> > + <para> > + Safely write all database files to disk and exit. This does not > + perform any of the normal <application>initdb</> operations. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > <term><option>-T <replaceable>CFG</></option></term> > <term><option>--text-search-config=<replaceable>CFG</></option></term> > <listitem> > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c > new file mode 100644 > index 402504b..8c0a9f4 > *** a/src/bin/initdb/initdb.c > --- b/src/bin/initdb/initdb.c > *************** static const char *authmethodlocal = ""; > *** 118,123 **** > --- 118,124 ---- > static bool debug = false; > static bool noclean = false; > static bool do_sync = true; > + static bool sync_only = false; > static bool show_setting = false; > static char *xlog_dir = ""; > > *************** usage(const char *progname) > *** 2796,2801 **** > --- 2797,2803 ---- > printf(_(" -n, --noclean do not clean up after errors\n")); > printf(_(" -N, --nosync do not wait for changes to be > written safely to disk\n")); > printf(_(" -s, --show show internal settings\n")); > + printf(_(" -S, --sync-only only sync data directory\n")); > printf(_("\nOther options:\n")); > printf(_(" -V, --version output version information, then > exit\n")); > printf(_(" -?, --help show this help, then exit\n")); > *************** main(int argc, char *argv[]) > *** 3445,3450 **** > --- 3447,3453 ---- > {"show", no_argument, NULL, 's'}, > {"noclean", no_argument, NULL, 'n'}, > {"nosync", no_argument, NULL, 'N'}, > + {"sync-only", no_argument, NULL, 'S'}, > {"xlogdir", required_argument, NULL, 'X'}, > {NULL, 0, NULL, 0} > }; > *************** main(int argc, char *argv[]) > *** 3476,3482 **** > > /* process command-line options */ > > ! while ((c = getopt_long(argc, argv, "dD:E:L:nNU:WA:sT:X:", > long_options, &option_index)) != -1) > { > switch (c) > { > --- 3479,3485 ---- > > /* process command-line options */ > > ! while ((c = getopt_long(argc, argv, "dD:E:L:nNU:WA:sST:X:", > long_options, &option_index)) != -1) > { > switch (c) > { > *************** main(int argc, char *argv[]) > *** 3522,3527 **** > --- 3525,3533 ---- > case 'N': > do_sync = false; > break; > + case 'S': > + sync_only = true; > + break; > case 'L': > share_path = pg_strdup(optarg); > break; > *************** main(int argc, char *argv[]) > *** 3589,3594 **** > --- 3595,3608 ---- > exit(1); > } > > + /* If we only need to fsync, just to it and exit */ > + if (sync_only) > + { > + setup_pgdata(); > + perform_fsync(); > + return 0; > + } > + > if (pwprompt && pwfilename) > { > fprintf(stderr, _("%s: password prompt and password file cannot > be specified together\n"), progname); > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers