Re: pg_upgrade: Error out on too many command-line arguments
Bonjour Michaël, I don't see why not. Perhaps this could be done for pgbench and oid2name as well? This is for pgbench. I did not found a TAP test in pg_upgrade, I do not think that it is worth creating one for that purpose. The "test.sh" script does not seem appropriate for this kind of coverage error test. The TAP test for oid2name only includes basic checks, options and arguments are not even tested, and there is no infra for actual testing, eg starting a server… Improving that would be a much more significant effort that the one line I added to pgbench existing TAP test. -- Fabien.
Re: refactoring - share str2*int64 functions
On Thu, Aug 29, 2019 at 08:14:54AM +0200, Fabien COELHO wrote: > Attached v7 does create uint64 overflow inline functions. The stuff yet is > not moved to "common/int.c", a file which does not exists, even if > "common/int.h" does. Thanks for the new version. I have begun reviewing your patch, and attached is a first cut that I would like to commit separately which adds all the compatibility overflow routines to int.h for uint16, uint32 and uint64 with all the fallback implementations (int128-based method added as well if available). I have also grouped at the top of the file the comments about each routine's return policy to avoid duplication. For the fallback implementations of uint64 using int128, I think that it is cleaner to use uint128 so as there is no need to check after negative results for multiplications, and I have made the various expressions consistent for each size. Attached is a small module called "overflow" with various regression tests that I used to check each implementation. I don't propose that for commit as I am not sure if that's worth the extra CPU, so let's consider it as a toy for now. What do you think? -- Michael overflow.tar.gz Description: application/gzip diff --git a/src/include/common/int.h b/src/include/common/int.h index d754798543..ccef1b4cc9 100644 --- a/src/include/common/int.h +++ b/src/include/common/int.h @@ -20,11 +20,26 @@ #ifndef COMMON_INT_H #define COMMON_INT_H -/* - * If a + b overflows, return true, otherwise store the result of a + b into - * *result. The content of *result is implementation defined in case of + +/*- + * The following guidelines apply to all the routines: + * - If a + b overflows, return true, otherwise store the result of a + b + * into *result. The content of *result is implementation defined in case of * overflow. + * - If a - b overflows, return true, otherwise store the result of a - b + * into *result. The content of *result is implementation defined in case of + * overflow. + * - If a * b overflows, return true, otherwise store the result of a * b + * into *result. The content of *result is implementation defined in case of + * overflow. + *- */ + +/* + * Overflow routines for signed integers + * + */ + static inline bool pg_add_s16_overflow(int16 a, int16 b, int16 *result) { @@ -43,11 +58,6 @@ pg_add_s16_overflow(int16 a, int16 b, int16 *result) #endif } -/* - * If a - b overflows, return true, otherwise store the result of a - b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_sub_s16_overflow(int16 a, int16 b, int16 *result) { @@ -66,11 +76,6 @@ pg_sub_s16_overflow(int16 a, int16 b, int16 *result) #endif } -/* - * If a * b overflows, return true, otherwise store the result of a * b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_mul_s16_overflow(int16 a, int16 b, int16 *result) { @@ -89,11 +94,6 @@ pg_mul_s16_overflow(int16 a, int16 b, int16 *result) #endif } -/* - * If a + b overflows, return true, otherwise store the result of a + b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_add_s32_overflow(int32 a, int32 b, int32 *result) { @@ -112,11 +112,6 @@ pg_add_s32_overflow(int32 a, int32 b, int32 *result) #endif } -/* - * If a - b overflows, return true, otherwise store the result of a - b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_sub_s32_overflow(int32 a, int32 b, int32 *result) { @@ -135,11 +130,6 @@ pg_sub_s32_overflow(int32 a, int32 b, int32 *result) #endif } -/* - * If a * b overflows, return true, otherwise store the result of a * b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_mul_s32_overflow(int32 a, int32 b, int32 *result) { @@ -158,11 +148,6 @@ pg_mul_s32_overflow(int32 a, int32 b, int32 *result) #endif } -/* - * If a + b overflows, return true, otherwise store the result of a + b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_add_s64_overflow(int64 a, int64 b, int64 *result) { @@ -190,11 +175,6 @@ pg_add_s64_overflow(int64 a, int64 b, int64 *result) #endif } -/* - * If a - b overflows, return true, otherwise store the result of a - b into - * *result. The content of *result is implementation defined in case of - * overflow. - */ static inline bool pg_sub_s64_overflow(int64 a, int64 b, int64 *result) { @@ -222,11 +202,6 @@ pg_sub_s64_overflow(int64 a, int64 b, int64 *result) #endif } -/* - * If a * b overflows, return true, otherwise store the result of a * b into -
Re: refactoring - share str2*int64 functions
Michaël, attached is a first cut that I would like to commit separately which adds all the compatibility overflow routines to int.h for uint16, uint32 and uint64 with all the fallback implementations (int128-based method added as well if available). I have also grouped at the top of the file the comments about each routine's return policy to avoid duplication. For the fallback implementations of uint64 using int128, I think that it is cleaner to use uint128 so as there is no need to check after negative results for multiplications, and I have made the various expressions consistent for each size. Patch applies cleanly, compiles, "make check" ok, but the added functions are not used (yet). I think that factoring out comments is a good move. For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32 and uint32 could use uint64, and the division-based method be dropped in these cases. Maybe I would add a comment before each new section telling about the type, eg: /* * UINT16 */ add/sub/mul uint16 functions… I would tend to commit working solutions per type rather than by installment, so that at least all added functions are actually used somewhere, but it does not matter much, really. I was unsure that having int128 implies uint128 availability, so I did not use it. The checking extension is funny, but ISTM that these checks should be (are already?) included in some standard sql test, which will test the macros from direct SQL operations: fabien=# SELECT INT2 '1234512434343'; ERROR: value "1234512434343" is out of range for type smallint Well, a quick look at "src/test/regress/sql/int2.sql" suggests that there the existing tests should be extended… :-( -- Fabien.
Re: POC: Cleaning up orphaned files using undo logs
Hello Thomas, I was doing some testing for the scenario where the undo written by a transaction overflows to multiple undo logs. For that I've modified the following macro: #define UndoLogMaxSize (1024 * 1024) /* 1MB undo log size */ (I should have used the provided pg_force_switch_undo though..) I'm getting the following assert failure while performing the recovery with the same. "TRAP: FailedAssertion("slot->meta.status == UNDO_LOG_STATUS_FULL", File: "undolog.c", Line: 997)" I found that we don't emit an WAL record when we update the slot->meta.status as UNDO_LOG_STATUS_FULL. If we don't that, after crash recovery, some new transaction may use that undo log which is wrong, IMHO. Am I missing something? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: basebackup.c's sendFile() ignores read errors
On Thu, Aug 29, 2019 at 3:17 PM Jeevan Ladhe wrote: > Hi Jeevan > > I had a look at the patch and this seems correct to me. > Thanks, Jeevan Ladhe. > > Few minor comments: > > + /* Check fread() error. */ > + CHECK_FREAD_ERROR(fp, pathbuf); > + > > The comments above the macro call at both the places are not necessary as > your macro name itself is self-explanatory. > > -- > + /* > + * If file is truncated, then we will hit > + * end-of-file error in which case we don't > + * want to error out, instead just pad it with > + * zeros. > + */ > + if (feof(fp)) > > The if block does not do the truncation right away, so I think the comment > above can be reworded to explain why we reset cnt? > Fixed both comments in the attached patch. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index c91f66d..3aea470 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -90,6 +90,18 @@ static char *statrelpath = NULL; */ #define THROTTLING_FREQUENCY 8 +/* + * Checks whether we encountered any error in fread(). fread() doesn't give + * any clue what has happened, so we check with ferror(). Also, neither + * fread() nor ferror() set errno, so we just throw a generic error. + */ +#define CHECK_FREAD_ERROR(fp, filename) \ +do { \ + if (ferror(fp)) \ + ereport(ERROR, \ +(errmsg("could not read from file \"%s\"", filename))); \ +} while (0) + /* The actual number of bytes, transfer of which may cause sleep. */ static uint64 throttling_sample; @@ -543,6 +555,8 @@ perform_base_backup(basebackup_options *opt) break; } + CHECK_FREAD_ERROR(fp, pathbuf); + if (len != wal_segment_size) { CheckXLogRemoved(segno, tli); @@ -1478,6 +1492,20 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) { +/* + * If the file is truncated, then we will hit + * an end-of-file error in which case we don't + * want to error out, instead just pad the rest + * of the file with zeros. However, we want to + * send all the bytes read so far, and thus set + * the cnt accordingly. + */ +if (feof(fp)) +{ + cnt = BLCKSZ * i; + break; +} + ereport(ERROR, (errcode_for_file_access(), errmsg("could not reread block %d of file \"%s\": %m", @@ -1529,7 +1557,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf len += cnt; throttle(cnt); - if (len >= statbuf->st_size) + if (feof(fp) || len >= statbuf->st_size) { /* * Reached end of file. The file could be longer, if it was @@ -1540,6 +1568,8 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf } } + CHECK_FREAD_ERROR(fp, readfilename); + /* If the file was truncated while we were sending it, pad it with zeros */ if (len < statbuf->st_size) {
Re: [HACKERS] CLUSTER command progress monitor
On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada wrote: > > Hi Michael, Alvaro and Robert! > > On 2019/08/14 11:52, Michael Paquier wrote: > > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: > >> On 2019/08/13 14:40, Tatsuro Yamada wrote: > >>> On 2019/08/02 3:43, Alvaro Herrera wrote: > Hmm, I'm trying this out now and I don't see the index_rebuild_count > ever go up. I think it's because the indexes are built using parallel > index build ... or maybe it was the table AM changes that moved things > around, not sure. There's a period at the end when the CLUSTER command > keeps working, but it's gone from pg_stat_progress_cluster. > >>> > >>> Thanks for your report. > >>> I'll investigate it. :) > >> > >> I did "git bisect" and found the commit: > >> > >> 03f9e5cba0ee1633af4abe734504df50af46fbd8 > >> Report progress of REINDEX operations > > > > I am adding an open item for this one. > > -- > > Michael > > > Okay, I checked it on the wiki. > >https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items >- index_rebuild_count in CLUSTER reporting never increments > > To be clear, 03f9e5cb broke CLUSTER progress reporting, but > I investigated little more and share my ideas to fix the problem. > > * Call stack > > cluster_rel >pgstat_progress_start_command(CLUSTER) *A1 >rebuild_relation > finish_heap_swap >reindex_relation > reindex_index >pgstat_progress_start_command(CREATE_INDEX) *B1 >pgstat_progress_end_command() *B2 > pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :( >pgstat_progress_end_command() *A2 > >Note > These are sets: >A1 and A2, >B1 and B2 > > > > * Ideas to fix >There are Three options, I guess. > >1. Call pgstat_progress_start_command(CLUSTER) again > before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i). > >2. Add "save and restore" functions for the following two > variables of MyBeentry in pgstat.c. > - st_progress_command > - st_progress_command_target > >3. Use Hash or List to store multiple values for the two > variables in pgstat.c. > > > > I tried 1. and it shown index_rebuild_count, but it also shown > "initializing" phase again and other columns were empty. So, it is > not suitable to fix the problem. :( > I'm going to try 2. and 3., but, unfortunately, I can't get enough > time to do that after PGConf.Asia 2019. > If we selected 3., it affects following these progress reporting > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > I suppose. Any comments welcome! :) I looked at this open item. I prefer #3 but I think it would be enough to have a small stack using a fixed length array to track nested progress information and the current executing command (maybe 4 or 8 would be enough for maximum nested level for now?). That way, we don't need any change the interfaces. AFAICS there is no case where we call only either pgstat_progress_start_command or pgstat_progress_end_command without each other (although pgstat_progress_update_param is called without start). So I think that having a small stack for tracking multiple reports would work. Attached the draft patch that fixes this issue. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index d362e7f7d7..99e4844e6c 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3016,8 +3016,10 @@ pgstat_bestart(void) #endif lbeentry.st_state = STATE_UNDEFINED; - lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID; - lbeentry.st_progress_command_target = InvalidOid; + MemSet(&(lbeentry.st_progress_cmds), 0, + sizeof(PgBackendProgressInfo) * PGSTAT_MAX_PROGRESS_INFO); + /* Set invalid command index */ + lbeentry.st_current_cmd = -1; /* * we don't zero st_progress_param here to save cycles; nobody should @@ -3203,10 +3205,22 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid) if (!beentry || !pgstat_track_activities) return; + Assert(beentry->st_current_cmd >= -1); + + /* The given command is already started */ + if (beentry->st_progress_cmds[beentry->st_current_cmd].command == cmdtype) + return; + + /* The progress information queue is full */ + if (beentry->st_current_cmd >= PGSTAT_MAX_PROGRESS_INFO - 1) + elog(ERROR, "progress information per backends is full"); + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); - beentry->st_progress_command = cmdtype; - beentry->st_progress_command_target = relid; - MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param)); + beentry->st_current_cmd++; + MemSet(&(beentry->st_pr
Re: Yet another fast GiST build
> 30 авг. 2019 г., в 3:47, Alexander Korotkov > написал(а): > > 1) Binary search in non-leaf pages instead of probing each key is much faster. That's a neat idea, but key union breaks ordering, even for z-order. for two sets of tuples X and Y if for any i,o from N, Xi < Yo does not guaranty union(X) < union (Y) For example consider this z-ordered keyspace (picture attached) union(5, 9) is z-order-smaller than union(4,4) I'm not even sure we can use sorted search for choosing subtree for insertion. How do you think, should I supply GiST-build patch with docs and tests and add it to CF? Or do we need more design discussion before? Best regards, Andrey Borodin.
Re: WIP: Generic functions for Node types using generated metadata
Hello Andres, Just my 0.02 €: There's been a lot of complaints over the years about how annoying it is to keep the out/read/copy/equalfuncs.c functions in sync with the actual underlying structs. There've been various calls for automating their generation, but no actual patches that I am aware of. I started something a while back, AFAICR after spending stupid time looking for a stupid missing field copy or whatever. I wrote a (simple) perl script deriving all (most) node utility functions for the header files. I gave up as the idea did not gather much momentum from committers, so I assumed the effort would be rejected in the end. AFAICR the feedback spirit was something like "node definition do not change often, we can manage it by hand". There also recently has been discussion about generating more efficient memory layout for node trees that we know are read only (e.g. plan trees inside the plancache), and about copying such trees more efficiently (e.g. by having one big allocation, and then just adjusting pointers). If pointers are relative to the start, it could be just indexes that do not need much adjusting. One way to approach this problem would be to to parse the type definitions, and directly generate code for the various functions. But that does mean that such a code-generator needs to be expanded for each such functions. No big deal for the effort I made. The issue was more dealing with exceptions (eg "we do not serialize this field because it is not used for some reason") and understanding some implicit assumptions in the struct declarations. An alternative approach is to have a parser of the node definitions that doesn't generate code directly, but instead generates metadata. And then use that metadata to write node aware functions. This seems more promising to me. Hmmm. The approach we had in an (old) research project was to write the meta data, and derive all struct & utility functions from these. It is simpler this way because you save parsing some C, and it can be made language agnostic (i.e. serializing the data structure from a language and reading its value from another). I'm fairly sure this metadata can also be used to write the other currently existing node functions. Beware of strange exceptions… With regards to using libclang for the parsing: I chose that because it seemed the easiest to experiment with, compared to annotating all the structs with enough metadata to be able to easily parse them from a perl script. I did not find this an issue when I tried, because the annotation needed is basically the type name of the field. The node definitions are after all distributed over quite a few headers. Yep. I think it might even be the correct way forward, over inventing our own mini-languages and writing ad-hoc parsers for those. It sure is easier to understand plain C code, compared to having to understand various embeded mini-languages consisting out of macros. Dunno. The obvious drawback is that it'd require more people to install libclang - a significant imposition. Indeed. A perl-only dependence would be much simpler that relying on a particular library from a particular compiler to compile postgres, possibly with an unrelated compiler. Alternatively we could annotate the code enough to be able to write our own parser, or use some other C parser. If you can dictate some conventions, eg one line/one field, simple perl regexpr would work well I think, you would not need a parser per se. I don't really want to invest significantly more time into this without first debating the general idea. That what I did, and I quitted quickly:-) On the general idea, I'm 100% convinced that stupid utility functions should be either generic or generated, not maintained by hand. -- Fabien.
Re: basebackup.c's sendFile() ignores read errors
> > Fixed both comments in the attached patch. > Thanks, the patch looks good to me. Regards, Jeevan Ladhe
Re: WIP: Generic functions for Node types using generated metadata
Hallo Andres, There've been various calls for automating their generation, but no actual patches that I am aware of. I started something a while back I have found this thread: https://www.postgresql.org/message-id/flat/E1cq93r-0004ey-Mp%40gemulon.postgresql.org It seems that comments from committers discouraged me to go on… :-) For instance Robert wanted a "checker", which is basically harder than a generator because you have to parse both sides and then compare. Basically there was no consensus at the time (March 2017), so I gave up. It seems that I even distroyed the branch I was working on, so the no doubt magnificent perl script I wrote is also lost. -- Fabien.
Re: block-level incremental backup
Hi, I am doing some testing on pg_basebackup and pg_combinebackup patches. I have also tried to create tap test for pg_combinebackup by taking reference from pg_basebackup tap cases. Attaching first draft test patch. I have done some testing with compression options, both -z and -Z level is working with incremental backup. A minor comment : It is mentioned in pg_combinebackup help that maximum 10 incremental backup can be given with -i option, but I found maximum 9 incremental backup directories can be given at a time. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation On Thu, Aug 29, 2019 at 10:06 PM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, > extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. > > Thoughts? > > Regards, > Jeevan Ladhe > diff --git a/src/bin/pg_combinebackup/t/pg_combinebackup.pl b/src/bin/pg_combinebackup/t/pg_combinebackup.pl new file mode 100644 index 000..e0f834a --- /dev/null +++ b/src/bin/pg_combinebackup/t/pg_combinebackup.pl @@ -0,0 +1,79 @@ +use strict; +use warnings; +use Cwd; +use Config; +use File::Basename qw(basename dirname); +use File::Path qw(rmtree); +use PostgresNode; +use TestLib; +use Test::More tests => 23; + +program_help_ok('pg_combinebackup'); +program_version_ok('pg_combinebackup'); +program_options_handling_ok('pg_combinebackup'); + +my $tempdir = TestLib::tempdir; + +my $node = get_new_node('main'); + +# Initialize node +$node->init(); +my $pgdata = $node->data_dir; + +# Change wal related setting for pg_basebackup to run +open my $conf, '>>', "$pgdata/postgresql.conf"; +print $conf "max_replication_slots = 10\n"; +print $conf "max_wal_senders = 10\n"; +print $conf "wal_level = replica\n"; +close $conf; +$node->start; + +$node->command_fails(['pg_combinebackup'], + 'pg_combinebackup needs full and incremental directory specified'); + +# Create an unlogged table to test that forks other than init are not copied. +$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)'); + +my $baseUnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('base_unlogged')}); + +# Make sure main and init forks exist +ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base'); +ok(-f "$pgdata/$baseUnloggedPath",'unlogged main fork in base'); + +# Run full base backup. +$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup"], + 'pg_basebackup runs for full backup'); +ok(-f "$tempdir/backup/PG_VERSION", 'full backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup/$baseUnloggedPath", + 'unlogged main fork not in backup'); + +# Get LSN of last backup to use for incremental backupslurp_file +my @extract_lsn = split (" ", scalar TestLib::slurp_file("$tempdir/backup/backup_label")); +my $LSN = $extract_lsn[3]; + +# Run incr base backup. +$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup1",'--lsn', "$LSN"], + 'pg_basebackup runs for incremental backup'); +ok(-f "$tempdir/backup1/PG_VERSION", 'incremental backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup1/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup1/$baseUnloggedPath", + 'unlogged main fork not in backup'); + +# Run pg_combinebackup. +$node->command_ok([ 'pg_combinebackup', '-f', "$tempdir/backup", '-i', "$tempdir/backup1", '-o', "$tempdir/backup2"], + 'pg_combinebackup runs'); +ok(-f "$tempdir/backup2/PG_VERSION", 'combined backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup2/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup2/$baseUnloggedPath", + 'unlogged main fork not in backup');
Re: block-level incremental backup
Here are some comments: +/* The reference XLOG position for the incremental backup. */ +static XLogRecPtr refptr; As Robert already pointed we may want to pass this as parameter around instead of a global variable. Also, can be renamed to something like: incr_backup_refptr. I see in your earlier version of patch this was named startincrptr, which I think was more meaningful. - /* * If incremental backup, see whether the filename is a relation filename * or not. */ Can be reworded something like: "If incremental backup, check if it is relation file and can be sent partially." - + if (verify_checksum) + { + ereport(WARNING, + (errmsg("cannot verify checksum in file \"%s\", block " + "%d: read buffer size %d and page size %d " + "differ", + readfilename, blkno, (int) cnt, BLCKSZ))); + verify_checksum = false; + } For do_incremental_backup() it does not make sense to show the block number in warning as it is always going to be 0 when we throw this warning. Further, I think this can be rephrased as: "cannot verify checksum in file \"%s\", read file size %d is not multiple of page size %d". Or maybe we can just say: "cannot verify checksum in file \"%s\"" if checksum requested, disable the checksum and leave it to the following message: + ereport(WARNING, + (errmsg("file size (%d) not in multiple of page size (%d), sending whole file", + (int) cnt, BLCKSZ))); - If you agree on the above comment for blkno, then we can shift declaration of blkno inside the condition " if (!sendwholefile)" in do_incremental_backup(), or avoid it altogether, and just pass "i" as blkindex, as well as blkno to verify_page_checksum(). May be add a comment why they are same in case of incremental backup. - I think we should give the user hint from where he should be reading the input lsn for incremental backup in the --help option as well as documentation? Something like - "To take an incremental backup, please provide value of "--lsn" as the "START WAL LOCATION" of previously taken full backup or incremental backup from backup_lable file. - pg_combinebackup: +static bool made_new_outputdata = false; +static bool found_existing_outputdata = false; Both of these are global, I understand that we need them global so that they are accessible in cleanup_directories_atexit(). But they are passed to verify_dir_is_empty_or_create() as parameters, which I think is not needed. Instead verify_dir_is_empty_or_create() can directly change the globals. - I see that checksum_failure is never set and always remains as false. May be it is something that you wanted to set in combine_partial_files() when a the corrupted partial file is detected? - I think the logic for verifying the backup chain should be moved out from main() function to a separate function. - + /* + * Verify the backup chain. INCREMENTAL BACKUP REFERENCE WAL LOCATION of + * the incremental backup must match with the START WAL LOCATION of the + * previous backup, until we reach a full backup in which there is no + * INCREMENTAL BACKUP REFERENCE WAL LOCATION. + */ The current logic assumes the incremental backup directories are to be provided as input in the serial order the backups were taken. This is bit confusing unless clarified in pg_combinebackup help menu or documentation. I think we should clarify it at both the places. - I think scan_directory() should be rather renamed as do_combinebackup(). Regards, Jeevan Ladhe On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, > extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. > > Thoughts? > > Regards, > Jeevan Ladhe >
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Hi, > > This patch has been around for some time now, the last version fails to > > apply cleanly and in-depth reviews have happened. I am moving that to > > the next CF, waiting on its author. > > Unfortunately, nothing was changed since then, so there is already some amount > of unaddressed review feedback. I'll move this patch to "Returned with > feedback". > Craig Ringer mentioned about this thread to me recently. This effort has seen decent reviews from Craig, Andres and Michael already. So, I thought of refreshing it to work against latest master HEAD. PFA, main patch as well as the test patch (I named the test patch v17 to be consistent with the main patch). The major grouse with the test patch AFAICS was the use of non-Windows compliant timersub() function. I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of portable macros for the same. Please let me know on what we think of the above. Regards, Nikhil -- Nikhil Sontakke 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/ 0002-libpq_batch_tests_community_master.v17.patch Description: Binary data 0001-libpq_batch_support_commmunity_master.v17.patch Description: Binary data
Re: refactoring - share str2*int64 functions
On Fri, Aug 30, 2019 at 10:06:11AM +0200, Fabien COELHO wrote: > Patch applies cleanly, compiles, "make check" ok, but the added functions > are not used (yet). Thanks. > I think that factoring out comments is a good move. > > For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32 > and uint32 could use uint64, and the division-based method be dropped in > these cases. Yes, the division would be worse than the other. What do you think about using the previous module I sent and measure how long it takes to evaluate the overflows in some specific cases N times in loops? > Maybe I would add a comment before each new section telling about the type, > eg: > > /* > * UINT16 > */ > add/sub/mul uint16 functions. Let's do as you suggest here. > I would tend to commit working solutions per type rather than by > installment, so that at least all added functions are actually used > somewhere, but it does not matter much, really. I prefer by section, with testing dedicated to each part properly done so as we can move to the next parts. > I was unsure that having int128 implies uint128 availability, so I did not > use it. The recent Ryu-floating point work has begun using them (see f2s.c and d2s.c). > The checking extension is funny, but ISTM that these checks should be (are > already?) included in some standard sql test, which will test the macros > from direct SQL operations: Sure. But not for the unsigned part as there are no equivalent in-core data types, still it is possible to trick things with signed integer arguments. I found my toy useful to check test all implementations consistently. > fabien=# SELECT INT2 '1234512434343'; > ERROR: value "1234512434343" is out of range for type smallint > > Well, a quick look at "src/test/regress/sql/int2.sql" suggests that > there the existing tests should be extended… :-( We can tackle that separately. -32768 is perfectly legal for smallint, and the test is wrong here. -- Michael signature.asc Description: PGP signature
Re: pg_upgrade: Error out on too many command-line arguments
Fabien COELHO writes: > Could we maintain coverage by adding a TAP test? See 1 liner attached. Is this issue *really* worth expending test cycles on forevermore? Test cycles are not free, and I see zero reason to think that a check of this sort would ever catch any bugs. Now, if you had a way to detect that somebody had forgotten the case in some new program, that would be interesting. regards, tom lane
Re: pg_upgrade: Error out on too many command-line arguments
Hello Tom, Could we maintain coverage by adding a TAP test? See 1 liner attached. Is this issue *really* worth expending test cycles on forevermore? With this argument consistently applied, postgres code coverage is consistently weak, with 25% of the code never executed, and 15% of functions never called. "psql" is abysmal, "libpq" is really weak. Test cycles are not free, and I see zero reason to think that a check of this sort would ever catch any bugs. Now, if you had a way to detect that somebody had forgotten the case in some new program, that would be interesting. It could get broken somehow, and the test would catch it? That would be the only command which tests this feature? This is a TAP test, not a test run on basic "make check". The cost is not measurable: pgbench 533 TAP tests run in 5 wallclock seconds, and this added test does not change that much. Now, if you say you are against it, then it is rejected… -- Fabien.
Re: pg_upgrade: Error out on too many command-line arguments
Fabien COELHO writes: >> Is this issue *really* worth expending test cycles on forevermore? > With this argument consistently applied, postgres code coverage is > consistently weak, with 25% of the code never executed, and 15% of > functions never called. "psql" is abysmal, "libpq" is really weak. It's all a question of balance. If you go too far in the other direction, you end up with test suites that take an hour and a half to run so nobody ever runs them (case in point, mysql). I'm all for improving coverage in meaningful ways --- but cases like this seem unlikely to be worth ongoing expenditures of testing effort. regards, tom lane
Re: refactoring - share str2*int64 functions
Michaël, For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32 and uint32 could use uint64, and the division-based method be dropped in these cases. Yes, the division would be worse than the other. What do you think about using the previous module I sent and measure how long it takes to evaluate the overflows in some specific cases N times in loops? Given the overheads of the SQL interpreter, I'm unsure about what you would measure at the SQL level. You could just write a small standalone C program to test perf and accuracy. Maybe this is what you have in mind. I would tend to commit working solutions per type rather than by installment, so that at least all added functions are actually used somewhere, but it does not matter much, really. I prefer by section, with testing dedicated to each part properly done so as we can move to the next parts. This suggests that you will test twice: once when adding the inlined functions, once when calling from SQL. The checking extension is funny, but ISTM that these checks should be (are already?) included in some standard sql test, which will test the macros from direct SQL operations: Sure. But not for the unsigned part as there are no equivalent in-core data types, Yep, it bothered me sometimes, but not enough to propose to add them. still it is possible to trick things with signed integer arguments. Is it? I found my toy useful to check test all implementations consistently. Ok. fabien=# SELECT INT2 '1234512434343'; ERROR: value "1234512434343" is out of range for type smallint Well, a quick look at "src/test/regress/sql/int2.sql" suggests that there the existing tests should be extended… :-( We can tackle that separately. -32768 is perfectly legal for smallint, and the test is wrong here. Do you mean: sql> SELECT -32768::INT2; ERROR: smallint out of range This is not a negative constant but the reverse of a positive, which is indeed out of range, although the error message could help more. sql> SELECT (-32768)::INT2; -32768 # ok sql> SELECT INT2 '-32768'; -32768 # ok -- Fabien.
Re: pg_upgrade: Error out on too many command-line arguments
Is this issue *really* worth expending test cycles on forevermore? With this argument consistently applied, postgres code coverage is consistently weak, with 25% of the code never executed, and 15% of functions never called. "psql" is abysmal, "libpq" is really weak. It's all a question of balance. If you go too far in the other direction, you end up with test suites that take an hour and a half to run so nobody ever runs them (case in point, mysql). I'm all for improving coverage in meaningful ways --- but cases like this seem unlikely to be worth ongoing expenditures of testing effort. Sure. I think there is room for several classes of tests, important ones always run and others run say by the farm, and I thought that is what TAP tests were for, given they are quite expensive anyway (eg most TAP test create their own postgres instance). So for me the suggestion was appropriate for a pgbench-specific TAP test. -- Fabien.
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
FWIW my understanding is that the speedup comes mostly from elimination of the serialization to a file. That however requires savepoints to handle aborts of subtransactions - I'm pretty sure I'd be trivial to create a workload where this will be much slower (with many aborts of large subtransactions). I think that instead of defining savepoints it is simpler and more efficient to use BeginInternalSubTransaction + ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction as it is done in PL/pgSQL (pl_exec.c). Not sure if it can pr -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: base backup client as auxiliary backend process
> Attached is a very hackish patch to implement this. It works like this: > > # (assuming you have a primary already running somewhere) > initdb -D data2 --replica > $EDITOR data2/postgresql.conf # set primary_conninfo > pg_ctl -D data2 start Attached is an updated patch for this. I have changed the initdb option name per suggestion. The WAL receiver is now started concurrently with the base backup. There is progress reporting (ps display), fsyncing. Configuration files are not copied anymore. There is a simple test suite. Tablespace support is still missing, but it would be straightforward. It's still all to be considered experimental, but it's taking shape and working pretty well. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From aae4640acbb2a1ae4ff5d2e80abce0798799fe73 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 30 Aug 2019 20:42:51 +0200 Subject: [PATCH v2] Base backup client as auxiliary backend process Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com --- doc/src/sgml/protocol.sgml| 12 +- doc/src/sgml/ref/initdb.sgml | 17 + src/backend/access/transam/xlog.c | 84 ++-- src/backend/bootstrap/bootstrap.c | 9 + src/backend/postmaster/pgstat.c | 6 + src/backend/postmaster/postmaster.c | 114 - src/backend/replication/basebackup.c | 68 +++ .../libpqwalreceiver/libpqwalreceiver.c | 419 ++ src/backend/replication/repl_gram.y | 9 +- src/backend/replication/repl_scanner.l| 1 + src/bin/initdb/initdb.c | 39 +- src/include/access/xlog.h | 6 + src/include/miscadmin.h | 2 + src/include/pgstat.h | 1 + src/include/replication/basebackup.h | 2 + src/include/replication/walreceiver.h | 4 + src/include/utils/guc.h | 2 +- src/test/recovery/t/018_basebackup.pl | 29 ++ 18 files changed, 768 insertions(+), 56 deletions(-) create mode 100644 src/test/recovery/t/018_basebackup.pl diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index b20f1690a7..81f43b5c00 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2466,7 +2466,7 @@ Streaming Replication Protocol -BASE_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ WAL ] [ NOWAIT ] [ MAX_RATE rate ] [ TABLESPACE_MAP ] [ NOVERIFY_CHECKSUMS ] +BASE_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ WAL ] [ NOWAIT ] [ MAX_RATE rate ] [ TABLESPACE_MAP ] [ NOVERIFY_CHECKSUMS ] [ EXCLUDE_CONF ] BASE_BACKUP @@ -2576,6 +2576,16 @@ Streaming Replication Protocol + + +EXCLUDE_CONF + + + Do not copy configuration files, that is, files that end in + .conf. + + + diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index da5c8f5307..1261e02d59 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -286,6 +286,23 @@ Options + + -r + --replica + + +Initialize a data directory for a physical replication replica. The +data directory will not be initialized with a full database system, +but will instead only contain a minimal set of files. A server that +is started on this data directory will first fetch a base backup and +then switch to standby mode. The connection information for the base +backup has to be configured by setting , and other parameters as desired, +before the server is started. + + + + -S --sync-only diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e651a841bb..7ab8ab45f5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -905,8 +905,6 @@ static void CheckRecoveryConsistency(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); -static void WriteControlFile(void); -static void ReadControlFile(void); static char *str_time(pg_time_t tnow); static bool CheckForStandbyTrigger(void); @@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void) * ReadControlFile() verifies they are correct. We could split out the * I/O and compatibility-check functions, but there seems no need currently. */ -static void +void WriteControlFile(void) { int fd; @@ -4573,7 +4571,7 @@ WriteControlFile(void)
Re: block-level incremental backup
On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. I don't agree. You're right that you would have to untar (and uncompress) the backup to run pg_combinebackup, but you would also have to do that to restore a non-incremental backup, so it doesn't seem much different. It's true that for an incremental backup you might need to untar and uncompress multiple prior backups rather than just one, but that's just the nature of an incremental backup. And, on a practical level, if you want compression, which is pretty likely if you're thinking about incremental backups, the way to get that is to use tar format with -z or -Z. It might be interesting to teach pg_combinebackup to be able to read tar-format backups, but I think that there are several variants of the tar format, and I suspect it would need to read them all. If someone un-tars and re-tars a backup with a different tar tool, we don't want it to become unreadable. So we'd either have to write our own de-tarring library or add an external dependency on one. I don't think it's worth doing that at this point; I definitely don't think it needs to be part of the first patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company