On Sun, Mar 14, 2021, at 11:01 PM, Thomas Munro wrote: > On Wed, Mar 10, 2021 at 1:31 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote: > > > Let's move this patch forward. Based on the responses, I agree the > > > default behavior should be to remove the temp files, and I think we > > > should have the GUC (on the off chance that someone wants to preserve > > > the temporary files for debugging or whatever other reason). > > > > Thanks for taking care of this. I am having some second-thoughts > > about changing this behavior by default, still that's much more useful > > this way. > > +1 for having it on by default. > > I was also just looking at this patch and came here to say LGTM except > for two cosmetic things, below. Thanks for taking a look at this patch. I'm not sure Tomas is preparing a new patch that includes the suggested modifications but I decided to do it. This new version has the new GUC name (using "remove"). I also replaced "cleanup" with "remove" in the all the remain places. As pointed by Thomas, I reworded the paragraph that describes the GUC moving the default information to the beginning of the sentence. I also added the "literal" as suggested by Michael. The postgresql.conf.sample was fixed. The tests was slightly modified. I reworded some comments and added a hack to avoid breaking the temporary file test in slow machines. A usleep() after sending the query provides some time for the query to create the temporary file. I used an arbitrarily sleep (10ms) that seems to be sufficient.
> > +#remove_temp_files_after_crash = on # remove temporary files after > > +# # backend crash? > > The indentation of the second line is incorrect here (Incorrect number > > of spaces in tabs perhaps?), and there is no need for the '#' at the > > beginning of the line. > > Yeah, that's wrong. For some reason that one file uses a tab size of > 8, unlike the rest of the tree (I guess because people will read that > file in software with the more common setting of 8). If you do :set > tabstop=8 in vim, suddenly it all makes sense, but it is revealed that > this patch has it wrong, as you said. (Perhaps this file should have > some of those special Vim/Emacs control messages so we don't keep > getting this wrong?) I hadn't noticed that this file use ts=8. (This explains the misalignment that I observed in some parameter comments). I'm not sure if people care about the indentation in this file. From the development perspective, we can use the same number of spaces for Tab as the source code so it is not required to fix your dev environment. However, the regular user doesn't follow the dev guidelines so it could probably observe the misalignment while using Vim (that has 8 spaces as default), for example. For now, I will add autocmd BufRead,BufNewFile postgresql.conf* setlocal ts=8 to my .vimrc. We should probably fix some settings that are misaligned such as parallel_setup_cost and shared_preload_libraries. The parameters timezone_abbreviations and max_pred_locks_per_page are using spaces instead of tabs and should probably be fixed too. -- Euler Taveira EDB https://www.enterprisedb.com/
From 8dfee5694210c14054170563ff01fc15a66a76b0 Mon Sep 17 00:00:00 2001 From: Euler Taveira <euler.tave...@2ndquadrant.com> Date: Mon, 25 May 2020 00:08:20 -0300 Subject: [PATCH v2] Control temporary files removal after crash A new GUC remove_temp_files_after_crash controls whether temporary files are removed after a crash. Successive crashes could result in useless storage usage until service is restarted. It could be the case on host with inadequate resources. This manual intervention for some environments is not desirable. This GUC is marked as SIGHUP hence you don't have to interrupt the service to change it. The current behavior introduces a backward incompatibility (minor one) which means Postgres will reclaim temporary file space after a crash. The main reason is that we favor service continuity over debugging. --- doc/src/sgml/config.sgml | 18 ++ src/backend/postmaster/postmaster.c | 5 + src/backend/storage/file/fd.c | 12 +- src/backend/utils/misc/guc.c | 9 + src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/postmaster/postmaster.h | 1 + src/test/recovery/t/022_crash_temp_files.pl | 192 ++++++++++++++++++ 7 files changed, 234 insertions(+), 5 deletions(-) create mode 100644 src/test/recovery/t/022_crash_temp_files.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5e551b9f6d..c0b1b48136 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9648,6 +9648,24 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' </listitem> </varlistentry> + <varlistentry id="guc-remove-temp-files-after-crash" xreflabel="remove_temp_files_after_crash"> + <term><varname>remove_temp_files_after_crash</varname> (<type>boolean</type>) + <indexterm> + <primary><varname>remove_temp_files_after_crash</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + When set to <literal>on</literal>, which is the default, + <productname>PostgreSQL</productname> will automatically remove + temporary files after a backend crash. If disabled, sucessive crashes + (while queries are using temporary files) could result in accumulation + of useless files. However, in some circumstances (e.g. debugging), + you want to examine these temporary files. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-data-sync-retry" xreflabel="data_sync_retry"> <term><varname>data_sync_retry</varname> (<type>boolean</type>) <indexterm> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index e8af05c04e..fc9c769bc6 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -242,6 +242,7 @@ bool Db_user_namespace = false; bool enable_bonjour = false; char *bonjour_name; bool restart_after_crash = true; +bool remove_temp_files_after_crash = true; /* PIDs of special child processes; 0 when not running */ static pid_t StartupPID = 0, @@ -3976,6 +3977,10 @@ PostmasterStateMachine(void) ereport(LOG, (errmsg("all server processes terminated; reinitializing"))); + /* remove leftover temporary files after a crash */ + if (remove_temp_files_after_crash) + RemovePgTempFiles(); + /* allow background workers to immediately restart */ ResetBackgroundWorkerCrashTimes(); diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index b58502837a..8c75d135a2 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3024,11 +3024,13 @@ CleanupTempFiles(bool isCommit, bool isProcExit) * remove any leftover files created by OpenTemporaryFile and any leftover * temporary relation files created by mdcreate. * - * NOTE: we could, but don't, call this during a post-backend-crash restart - * cycle. The argument for not doing it is that someone might want to examine - * the temp files for debugging purposes. This does however mean that - * OpenTemporaryFile had better allow for collision with an existing temp - * file name. + * During post-backend-crash restart cycle, this routine could be called if + * remove_temp_files_after_crash GUC is enabled. Multiple crashes while + * queries are using temp files could result in useless storage usage that can + * only be reclaimed by a service restart. The argument against enabling it is + * that someone might want to examine the temporary files for debugging + * purposes. This does however mean that OpenTemporaryFile had better allow for + * collision with an existing temp file name. * * NOTE: this function and its subroutines generally report syscall failures * with ereport(LOG) and keep going. Removing temp files is not so critical diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 855076b1fd..9b6170a6cd 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1361,6 +1361,15 @@ static struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + { + {"remove_temp_files_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS, + gettext_noop("Remove temporary files after backend crash."), + NULL + }, + &remove_temp_files_after_crash, + true, + NULL, NULL, NULL + }, { {"log_duration", PGC_SUSET, LOGGING_WHAT, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index f46c2dd7a8..1b67420c9f 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -757,6 +757,8 @@ #exit_on_error = off # terminate session on any error? #restart_after_crash = on # reinitialize after backend crash? +#remove_temp_files_after_crash = on # remove temporary files after + # backend crash? #data_sync_retry = off # retry or panic on failure to fsync # data? # (change requires restart) diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h index cfa59c4dc0..0efdd7c232 100644 --- a/src/include/postmaster/postmaster.h +++ b/src/include/postmaster/postmaster.h @@ -29,6 +29,7 @@ extern bool log_hostname; extern bool enable_bonjour; extern char *bonjour_name; extern bool restart_after_crash; +extern bool remove_temp_files_after_crash; #ifdef WIN32 extern HANDLE PostmasterHandle; diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl new file mode 100644 index 0000000000..c37b227770 --- /dev/null +++ b/src/test/recovery/t/022_crash_temp_files.pl @@ -0,0 +1,192 @@ +# Test remove of temporary files after a crash. +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More; +use Config; +use Time::HiRes qw(usleep); + +plan tests => 9; + + +# To avoid hanging while expecting some specific input from a psql +# instance being driven by us, add a timeout high enough that it +# should never trigger even on very slow machines, unless something +# is really wrong. +my $psql_timeout = IPC::Run::timer(60); + +my $node = get_new_node('node_crash'); +$node->init(); +$node->start(); + +# By default, PostgresNode doesn't restart after crash +# Reduce work_mem to generate temporary file with a few number of rows +$node->safe_psql( + 'postgres', + q[ALTER SYSTEM SET remove_temp_files_after_crash = on; + ALTER SYSTEM SET log_connections = 1; + ALTER SYSTEM SET work_mem = '64kB'; + ALTER SYSTEM SET restart_after_crash = on; + SELECT pg_reload_conf();]); + +# create table, insert rows +$node->safe_psql( + 'postgres', + q[CREATE TABLE tab_crash (a text); + INSERT INTO tab_crash (a) SELECT gen_random_uuid() FROM generate_series(1, 500);]); + +# Run psql, keeping session alive, so we have an alive backend to kill. +my ($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', ''); +my $killme = IPC::Run::start( + [ + 'psql', '-X', '-qAt', '-v', 'ON_ERROR_STOP=1', '-f', '-', '-d', + $node->connstr('postgres') + ], + '<', + \$killme_stdin, + '>', + \$killme_stdout, + '2>', + \$killme_stderr, + $psql_timeout); + +# Get backend pid +$killme_stdin .= q[ +SELECT pg_backend_pid(); +]; +ok(pump_until($killme, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m), + 'acquired pid for SIGKILL'); +my $pid = $killme_stdout; +chomp($pid); +$killme_stdout = ''; +$killme_stderr = ''; + +# Run the query that generates a temporary file and that will be killed before +# it finishes. Since the query that generates the temporary file does not +# return before the connection is killed, use a SELECT before to trigger +# pump_until. +$killme_stdin .= q[ +BEGIN; +SELECT $$in-progress-before-sigkill$$; +WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo; +]; +ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m), + 'select in-progress-before-sigkill'); +$killme_stdout = ''; +$killme_stderr = ''; + +# Wait some time so the temporary file is generated by SELECT +usleep(10_000); + +# Kill with SIGKILL +my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid); +is($ret, 0, 'killed process with KILL'); + +# Close psql session +$killme->finish; + +# Wait till server restarts +$node->poll_query_until('postgres', 'SELECT 1', '1'); + +# Check for temporary files +is($node->safe_psql( + 'postgres', + 'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)'), + qq(0), 'no temporary files'); + +# +# Test old behavior (don't remove temporary files after crash) +# +$node->safe_psql( + 'postgres', + q[ALTER SYSTEM SET remove_temp_files_after_crash = off; + SELECT pg_reload_conf();]); + +# Restart psql session +($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', ''); +$killme->run(); + +# Get backend pid +$killme_stdin .= q[ +SELECT pg_backend_pid(); +]; +ok(pump_until($killme, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m), + 'acquired pid for SIGKILL'); +$pid = $killme_stdout; +chomp($pid); +$killme_stdout = ''; +$killme_stderr = ''; + +# Run the query that generates a temporary file and that will be killed before +# it finishes. Since the query that generates the temporary file does not +# return before the connection is killed, use a SELECT before to trigger +# pump_until. +$killme_stdin .= q[ +BEGIN; +SELECT $$in-progress-before-sigkill$$; +WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo; +]; +ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m), + 'select in-progress-before-sigkill'); +$killme_stdout = ''; +$killme_stderr = ''; + +# Wait some time so the temporary file is generated by SELECT +usleep(10_000); + +# Kill with SIGKILL +$ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid); +is($ret, 0, 'killed process with KILL'); + +# Close psql session +$killme->finish; + +# Wait till server restarts +$node->poll_query_until('postgres', 'SELECT 1', '1'); + +# Check for temporary files -- should be there +is($node->safe_psql( + 'postgres', + 'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)'), + qq(1), 'one temporary file'); + +# Restart should remove the temporary files +$node->restart(); + +# Check the temporary files -- should be gone +is($node->safe_psql( + 'postgres', + 'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)'), + qq(0), 'temporary file was removed'); + +$node->stop(); + +# Pump until string is matched, or timeout occurs +sub pump_until +{ + my ($proc, $stream, $untl) = @_; + $proc->pump_nb(); + while (1) + { + last if $$stream =~ /$untl/; + if ($psql_timeout->is_expired) + { + diag("aborting wait: program timed out"); + diag("stream contents: >>", $$stream, "<<"); + diag("pattern searched for: ", $untl); + + return 0; + } + if (not $proc->pumpable()) + { + diag("aborting wait: program died"); + diag("stream contents: >>", $$stream, "<<"); + diag("pattern searched for: ", $untl); + + return 0; + } + $proc->pump(); + } + return 1; +} -- 2.20.1