Re: optimize file transfer in pg_upgrade
I've spent quite a bit of time recently trying to get this patch set into a reasonable state. It's still a little rough around the edges, and the code for the generated scripts is incomplete, but I figured I'd at least get some CI testing going. -- nathan >From 0af23114cfe5d00ab0b69ff804bb92d58d485adb Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 19 Feb 2025 09:14:51 -0600 Subject: [PATCH v3 1/4] initdb: Add --no-sync-data-files. This new option instructs initdb to skip synchronizing any files in database directories and the database directories themselves, i.e., everything in the base/ subdirectory and any other tablespace directories. Other files, such as those in pg_wal/ and pg_xact/, will still be synchronized unless --no-sync is also specified. --no-sync-data-files is primarily intended for internal use by tools that separately ensure the skipped files are synchronized to disk. A follow-up commit will use this to help optimize pg_upgrade's file transfer step. Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan --- doc/src/sgml/ref/initdb.sgml| 20 + src/bin/initdb/initdb.c | 10 ++- src/bin/initdb/t/001_initdb.pl | 1 + src/bin/pg_basebackup/pg_basebackup.c | 2 +- src/bin/pg_checksums/pg_checksums.c | 2 +- src/bin/pg_combinebackup/pg_combinebackup.c | 2 +- src/bin/pg_rewind/file_ops.c| 2 +- src/common/file_utils.c | 85 + src/include/common/file_utils.h | 2 +- 9 files changed, 89 insertions(+), 37 deletions(-) diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 0026318485a..14c401b9a99 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -527,6 +527,26 @@ PostgreSQL documentation + + --no-sync-data-files + + +By default, initdb safely writes all database files +to disk. This option instructs initdb to skip +synchronizing all files in the individual database directories and the +database directories themselves, i.e., everything in the +base subdirectory and any other tablespace +directories. Other files, such as those in pg_wal +and pg_xact, will still be synchronized unless the +--no-sync option is also specified. + + +This option is primarily intended for internal use by tools that +separately ensure the skipped files are synchronized to disk. + + + + --no-instructions diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 21a0fe3ecd9..22b7d31b165 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -168,6 +168,7 @@ static bool data_checksums = true; static char *xlog_dir = NULL; static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024); static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; +static bool sync_data_files = true; /* internal vars */ @@ -2566,6 +2567,7 @@ usage(const char *progname) printf(_(" -L DIRECTORY where to find the input files\n")); printf(_(" -n, --no-cleando not clean up after errors\n")); printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n")); + printf(_(" --no-sync-data-files do not sync files within database directories\n")); printf(_(" --no-instructions do not print instructions for next steps\n")); printf(_(" -s, --showshow internal settings, then exit\n")); printf(_(" --sync-method=METHOD set method for syncing files to disk\n")); @@ -3208,6 +3210,7 @@ main(int argc, char *argv[]) {"icu-rules", required_argument, NULL, 18}, {"sync-method", required_argument, NULL, 19}, {"no-data-checksums", no_argument, NULL, 20}, + {"no-sync-data-files", no_argument, NULL, 21}, {NULL, 0, NULL, 0} }; @@ -3402,6 +3405,9 @@ main(int argc, char *argv[]) case 20: data_checksums = false; break; + case 21: + sync_data_files = false; + break; default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -3453,7 +3459,7 @@ main(int argc, char *argv[]) fputs(_("syncing data to disk ... "), stdout); fflush(stdout); - sync_pgdata(pg_data, PG_VERSION_NUM, sync_method); + sync_pgdata(pg_data, PG_VERSION_NUM, sync_method, sync_data_files); check_ok(); return 0; } @
Re: optimize file transfer in pg_upgrade
On Fri, Feb 28, 2025 at 2:40 PM Robert Haas wrote: > Maybe we should rethink the decision not to transfer relfilenodes for > sequences. Or have more than one way to do it. pg_upgrade > --binary-upgrade --binary-upgrade-even-for-sequences, or whatever. Sorry, I meant: pg_dump --binary-upgrade --binary-upgrade-even-for-sequences i.e. pg_upgrade could decide which way to ask pg_dump to do it, depending on versions and flags. -- Robert Haas EDB: http://www.enterprisedb.com
Re: making EXPLAIN extensible
On Fri, 28 Feb 2025 at 19:26, Robert Haas wrote: > > Prior to PostgreSQL 10, EXPLAIN had just 2 options: VACUUM and > ANALYZE. Now, we're up to 12 options, which is already quite a lot, > and there's plenty more things that somebody might like to do. > However, not all of those things necessarily need to be part of the > core code. My original reason for wanting to extend EXPLAIN was that I > was thinking about an extension that would want to do a bunch of > things and one of those things would be to add some information to the > EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN > option whose whole purpose is to cater to the needs of some extension, > so that made me think of providing some extensibility infrastructure. > > However, there are other use cases, too, basically any of the normal > reasons why extensibility is useful and desirable. You might need to > get some information out a query plan that 99% of people don't care > about. You could come up with your own way of formatting a query plan, > but that's a big pain. It's a lot nicer if you can just add the detail > that you care about to the EXPLAIN output without needing to modify > PostgreSQL itself. Even if you think of something that really ought to > be included in the EXPLAIN output by PostgreSQL, you can roll an > extension out much quicker than you can get a change upstreamed and > released. So I think EXPLAIN extensibility is, as a general concept, > useful. > > So here are some patches. > > 0001 allows a loadable module to register new EXPLAIN options. > Currently, EXPLAIN (FUNGUS) will error out, but if you want to make it > work, this patch is for you. This patch also allows you to stash some > state related to your new option, or options, in the ExplainState. > Core options have hard-coded structure members; e.g. EXPLAIN (BUFFERS) > sets es->buffers. If you add EXPLAIN (FUNGUS), there won't be an > es->fungus, but you can get about the same effect using the new > facilities provided here. > > 0002 provides hooks that you can use to make your new EXPLAIN options > actually do something. In particular, this adds a new hook that is > called once per PlanState node, and a new nook that is called once per > PlannedStmt. Each is called at an appropriate point for you to tack on > more output after what EXPLAIN would already produce. > > 0003 adds a new contrib module called pg_overexplain, which adds > EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE). I actually think this is > quite useful for planner hacking, and maybe a few more options would > be, too. Right now, if you want to see stuff that EXPLAIN doesn't > clearly show, you have to use SET debug_print_plan = true, and that > output is so verbose that finding the parts you actually want to see > is quite difficult. Assuming it gives you the details you need, > EXPLAIN (RANGE_TABLE) looks way, way better to me, and if we end up > committing these patches I anticipate using this semi-regularly. > > There are plenty of debatable things in this patch set, and I mention > some of them in the commit messages. The hook design in 0002 is a bit > simplistic and could be made more complex; there's lots of stuff that > could be added to or removed from 0003, much of which comes down to > what somebody hacking on the planner would actually want to see. I'm > happy to bikeshed all of that stuff; this is all quite preliminary and > I'm not committed to the details. The only thing that would disappoint > me is if somebody said "this whole idea of making EXPLAIN extensible > is stupid and pointless and we shouldn't ever do it." I will argue > against that vociferously. I think even what I have here is enough to > disprove that hypothesis, but I have a bunch of ideas about how to do > more. Some of those require additional infrastructure and are best > proposed with that other infrastructure; some can be done with just > this, but I ran out of time to code up examples so here is what I have > got so far. > > Hope you like it, sorry if you don't. "pg_overexplain"? I love this name! And the idea sounds like a natural evolution, so +1. Some questions: One thing I am wondering is whether extensions should be required to prefix their EXPLAIN option with the extension name to avoid collisions. If two extensions happen to choose the same name, it won't be possible to use both simultaneously. In what order would the options be applied? Would it be deterministic, or weighted within the extension's configuration, or based on the order of the options in the list? Would explain extensions be capable of modifying pre-existing core option output, or just append to output? Should there be a way of determining which lines are output by which option? An extension may output similar output to core output, making it difficult or impossible to discern which is which. Does there need to be any security considerations so that things like RLS don't inadvertently become leaky? Thom
Improve monitoring of shared memory allocations
Hi, The 0001* patch improved the accounting for the shared memory allocated for a hash table during hash_create. pg_shmem_allocations tracks the memory allocated by ShmemInitStruct, which, for shared hash tables, only covers memory allocated for the hash directory and control structure via ShmemInitHash. The hash segments and buckets are allocated using ShmemAllocNoError, which does not attribute the allocation to the hash table and also does not add it to ShmemIndex. Therefore, these allocations are not tracked in pg_shmem_allocations. To improve this, include the allocation of segments and buckets in the initial allocation of the shared memory for the hash table, in ShmemInitHash. This will result in pg_shmem_allocations representing the total size of the initial hash table, including all the buckets and elements, instead of just the directory size. Like ShmemAllocNoError, the shared memory allocated by ShmemAlloc is not tracked by pg_shmem_allocations. The 0002* patch replaces most of the calls to ShmemAlloc with ShmemInitStruct to associate a name with the allocations and ensure that they get tracked by pg_shmem_allocations. I observed an improvement in total memory allocation by consolidating initial shared memory allocations for the hash table. For ex. the allocated size for the LOCK hash hash_create decreased from 801664 bytes to 799616 bytes. Please find the attached patches, which I will add to the March Commitfest. Thank you, Rahila Syed 0001-Account-for-initial-shared-memory-allocated-during-h.patch Description: Binary data 0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch Description: Binary data
Re: lwlocknames.h beautification attempt
Gurjeet Singh writes: > On Sat, Mar 1, 2025 at 10:26 PM Tom Lane wrote: >> This looks reasonably in line with project style ... > Should I create a commitfest entry for this patch, or is it > uncontroversial enough and small enough to not warrant that? The controversy would be more about whether it's worth bothering with. In the past we've taken some care to ensure that generated files would pass pgindent's ideas of correct formatting, but not more than that. AFAICT from some quick tests, pgindent has no opinions about layout of #define lines. regards, tom lane
Re: Statistics Import and Export
Jeff Davis writes: > On Sat, 2025-03-01 at 13:52 -0500, Greg Sabino Mullane wrote: >> Also, anything trained to parse pg_dump output will have to learn >> about the new SELECT pg_restore_ calls with their multi-line formats >> (not 100% sure we don't have that anywhere, as things like "SELECT >> setval" and "SELECT set_config" are single line, but there may be >> existing things) > That's an interesting point. What tools are currrently trying to parse > pg_dump output? That particular argument needs to be rejected vociferously. Otherwise we could never make any change at all in what pg_dump emits. I think the standard has to be "if you parse pg_dump output, it's on you to cope with any legal SQL". I do grasp Greg's larger point that this is a big change in pg_dump's behavior and will certainly break some expectations. I kind of lean to the position that we'll be sad in the long run if we don't change the default, though. What other part of pg_dump's output is not produced by default? regards, tom lane
Re: Incorrect result of bitmap heap scan.
On Fri, Feb 28, 2025 at 12:53 PM Matthias van de Meent wrote: > Rebased again, now on top of head due to f7a8fc10. Is everybody in agreement about committing and back patching this fix, which simply disables the optimization altogether? I myself don't see a better way, but thought I'd ask before proceeding with review and commit. -- Peter Geoghegan
Re: Incorrect result of bitmap heap scan.
Peter Geoghegan writes: > Is everybody in agreement about committing and back patching this fix, > which simply disables the optimization altogether? > I myself don't see a better way, but thought I'd ask before proceeding > with review and commit. If you don't see a clear path forward, then "disable" is the only reasonable choice for the back branches. Maybe we'll find a fix in future, but it seems unlikely that it'd be back-patchable. regards, tom lane
Re: Make COPY format extendable: Extract COPY TO format implementations
On Sat, Mar 1, 2025 at 10:50 AM Sutou Kouhei wrote: > > Hi, > > Our 0001/0002 patches were merged into master. I've rebased > on master. Can we discuss how to proceed rest patches? > > The contents of them aren't changed but I'll show a summary > of them again: > > 0001-0003 are for COPY TO and 0004-0007 are for COPY FROM. > > For COPY TO: > > 0001: Add support for adding custom COPY TO format. This > uses tablesample like handler approach. We've discussed > other approaches such as USING+CREATE XXX approach but it > seems that other approaches are overkill for this case. > > See also: > https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8 > > 0002: Export CopyToStateData to implement custom COPY TO > format as extension. > > 0003: Export a function and add a private space to > CopyToStateData to implement custom COPY TO format as > extension. > > We may want to squash 0002 and 0003 but splitting them will > be easy to review. Because 0002 just moves existing codes > (with some rename) and 0003 just adds some codes. If we > squash 0002 and 0003, moving and adding are mixed. > > For COPY FROM: > > 0004: This is COPY FROM version of 0001. > > 0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for > enum CopyDest. This is similar change for enum CopySource. > > 0006: This is COPY FROM version of 0003. > > 0007: This is for easy to implement "ON_ERROR stop" and > "LOG_VERBOSITY verbose" in extension. > > We may want to squash 0005-0007 like for 0002-0003. > > > Thanks, > -- > kou While review another thread (Emitting JSON to file using COPY TO), I found the recently committed patches on this thread pass the CopyFormatOptions struct directly rather a pointer of the struct as a function parameter of CopyToGetRoutine and CopyFromGetRoutine. Then I took a quick look at the newly rebased patch set and found Sutou has already fixed this issue. I'm wondering if we should fix it as a separate commit as it seems like an oversight of previous patches? -- Regards Junwang Zhao
Re: Emitting JSON to file using COPY TO
On Mon, Jan 27, 2025 at 4:17 PM jian he wrote: > > hi. > > There are two ways we can use to represent the new copy format: json. > 1. > typedef struct CopyFormatOptions > { > boolbinary;/* binary format? */ > boolfreeze;/* freeze rows on loading? */ > boolcsv_mode;/* Comma Separated Value format? */ > booljson_mode;/* JSON format? */ > ... > } > > 2. > typedef struct CopyFormatOptions > { > CopyFormatformat;/* format of the COPY operation */ > . > } > > typedef enum CopyFormat > { > COPY_FORMAT_TEXT = 0, > COPY_FORMAT_BINARY, > COPY_FORMAT_CSV, > COPY_FORMAT_JSON, > } CopyFormat; > > both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184. > so the struct size will not influence the performance. > > I also did some benchmarks when using CopyFormat. > the following are the benchmarks info: > --- > create unlogged table t as select g from generate_series(1, 1_000_000) g; > > build_type=release patch: > copy t to '/dev/null' json \watch i=0.1 c=10 > last execution Time: 108.741 ms > > copy t to '/dev/null' (format text) \watch i=0.1 c=10 > last execution Time: 42.600 ms > > build_type=release master: > copy t to '/dev/null' (format text) \watch i=0.1 c=10 > last execution Time Time: 42.948 ms > > - > so a new version is attached, using the struct CopyFormatOptions. > > changes mainly in CopyOneRowTo. > now it is: > > if(!cstate->rel) > { > memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0), >TupleDescAttr(cstate->queryDesc->tupDesc, 0), >cstate->queryDesc->tupDesc->natts * > sizeof(FormData_pg_attribute)); > for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++) > populate_compact_attribute(slot->tts_tupleDescriptor, i); > BlessTupleDesc(slot->tts_tupleDescriptor); > } > > reasoning for change: > for composite_to_json to construct json key, we only need > FormData_pg_attribute.attname > but code path > composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute > means we also need to call populate_compact_attribute to populate > other attributes. > > v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch, > author is by Joel Jacobson. > As I mentioned in above, > replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions. > but consolidated 3 bool fields into one enum to make code more lean. > I think the refactoring (v14-0001) is worth it. I've refactored the patch to adapt the newly introduced CopyToRoutine struct, see 2e4127b6d2. v15-0001 is the merged one of v14-0001 and v14-0002 There are some other ongoing *copy to/from* refactors[1] which we can benefit to make the code cleaner, especially the checks done in ProcessCopyOptions. [1]: https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com -- Regards Junwang Zhao v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch Description: Binary data v15-0001-Introduce-json-format-for-COPY-TO.patch Description: Binary data
Re: Statistics Import and Export
> > Independently of that, do we want to switch over to storing > reltuples as a string instead of converting it? I still feel > uncomfortable about the amount of baggage we added to pg_dump > to avoid that. I'm obviously a 'yes' vote for string, either fixed width buffer or pg_strdup'd, for the reduced complexity. I'm not dismissing concerns about memory usage, and we could free the RelStatsInfo structure after use, but we're already not freeing the parent structures tbinfo or indxinfo, probably because they're needed right up til the end of the program, and there's no subsequent consumer for the memory that we'd be freeing up.
lwlocknames.h beautification attempt
Currently the contents of lwlocknames.h look like this: #define ShmemIndexLock (&MainLWLockArray[1].lock) #define OidGenLock (&MainLWLockArray[2].lock) #define XidGenLock (&MainLWLockArray[3].lock) #define ProcArrayLock (&MainLWLockArray[4].lock) #define SInvalReadLock (&MainLWLockArray[5].lock) ... This makes it a bit hard to read, since there is no attempt at vertical alignment along the opening parentheses. It gets worse if the editor performs syntax highlighting, and the various colored characters in the definitions appear haphazardly arranged. Had this been hand-written, I can imagine the author making an attempt at aligning the definitions, but since this is a generated file, the lack of that attempt is understandable. I propose the following change to the generation script, generate-lwlocknames.pl -print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n"; +printf $h "#define %-30s %s\n", "${lockname}Lock", "(&MainLWLockArray[$lockidx].lock)"; which produces the lock names in this format #define ShmemIndexLock (&MainLWLockArray[1].lock) #define OidGenLock (&MainLWLockArray[2].lock) #define XidGenLock (&MainLWLockArray[3].lock) #define ProcArrayLock (&MainLWLockArray[4].lock) #define SInvalReadLock (&MainLWLockArray[5].lock) ... Yet another format, which I prefer, can be achieved by right-aligning the lock names. In addition to aligning the definitions, it also aligns the 'Lock' suffix in all the names. But as they say beauty is in the eye of the beholder, so this one may be actively disliked by others. -print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n"; +printf $h "#define %30s %s\n", "${lockname}Lock", "(&MainLWLockArray[$lockidx].lock)"; #define ShmemIndexLock (&MainLWLockArray[1].lock) #define OidGenLock (&MainLWLockArray[2].lock) #define XidGenLock (&MainLWLockArray[3].lock) #define ProcArrayLock (&MainLWLockArray[4].lock) #define SInvalReadLock (&MainLWLockArray[5].lock) ... The lockname column needs to be at least 30 chars wide to accommodate the longest of the lock names 'DynamicSharedMemoryControlLock'. Best regards, Gurjeet http://Gurje.et
Re: lwlocknames.h beautification attempt
Gurjeet Singh writes: > I propose the following change to the generation script, > generate-lwlocknames.pl > ... > which produces the lock names in this format > #define ShmemIndexLock (&MainLWLockArray[1].lock) > #define OidGenLock (&MainLWLockArray[2].lock) > #define XidGenLock (&MainLWLockArray[3].lock) > #define ProcArrayLock (&MainLWLockArray[4].lock) > #define SInvalReadLock (&MainLWLockArray[5].lock) This looks reasonably in line with project style ... > Yet another format, which I prefer, can be achieved by right-aligning the > lock names. > #define ShmemIndexLock (&MainLWLockArray[1].lock) > #define OidGenLock (&MainLWLockArray[2].lock) > #define XidGenLock (&MainLWLockArray[3].lock) > #define ProcArrayLock (&MainLWLockArray[4].lock) > #define SInvalReadLock (&MainLWLockArray[5].lock) ... but that doesn't. I challenge you to provide even one example of that layout in our source tree, or to explain why it's better. regards, tom lane
Re: lwlocknames.h beautification attempt
On Sat, Mar 1, 2025 at 10:26 PM Tom Lane wrote: > > Gurjeet Singh writes: > > I propose the following change to the generation script, > > generate-lwlocknames.pl > > ... > > which produces the lock names in this format > > > #define ShmemIndexLock (&MainLWLockArray[1].lock) > > #define OidGenLock (&MainLWLockArray[2].lock) > > #define XidGenLock (&MainLWLockArray[3].lock) > > #define ProcArrayLock (&MainLWLockArray[4].lock) > > #define SInvalReadLock (&MainLWLockArray[5].lock) > > This looks reasonably in line with project style ... Should I create a commitfest entry for this patch, or is it uncontroversial enough and small enough to not warrant that? > > Yet another format, which I prefer, can be achieved by right-aligning the > > lock names. > > > #define ShmemIndexLock (&MainLWLockArray[1].lock) > > #define OidGenLock (&MainLWLockArray[2].lock) > > #define XidGenLock (&MainLWLockArray[3].lock) > > #define ProcArrayLock (&MainLWLockArray[4].lock) > > #define SInvalReadLock (&MainLWLockArray[5].lock) > > ... but that doesn't. I challenge you to provide even one example > of that layout in our source tree, or to explain why it's better. I haven't seen this style in any other project, let alone in Postgres. I just mentioned it since I personally liked it slightly better after trying a couple of different styles, because of the aligned suffix in the lock names. Best regards, Gurjeet http://Gurje.et
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Thu, Feb 27, 2025 at 7:57 PM Peter Geoghegan wrote: > On Thu, Feb 27, 2025 at 3:42 PM Robert Haas wrote: > > Good documentation could help. Attached revision adds an example that shows how "Index Searches: N" can vary. This appears in "14.1.2. EXPLAIN ANALYZE". Other changes in this revision: * Improved commit message. * We now consistently show "Index Searches: N" after all other scan related output, so that it will reliably appear immediately before the "Buffers: " line. This seemed slightly better, since it is often useful to consider these two numbers together. My current plan is to commit this patch on Wednesday or Thursday, barring any objections. Thanks -- Peter Geoghegan 0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch Description: Binary data
Re: access numeric data in module
On Sat, 1 Mar 2025 at 17:33, Tom Lane wrote: > But IMO you still haven't made an acceptable case > for deciding that these data structures aren't private to numeric.c. > What behaviors do you actually need that aren't accessible via the > existing exported functons? FWIW in pg_duckdb we would definitely have liked to have access to some of the currently unexposed numeric internals. We vendored in some of the definitions that we required ourselves[1], but it would be very nice if we didn't have to. I cannot speak for Ed's reasoning, but for pg_duckdb the reason we cannot use the many of the already exposed functions are not thread-safe. So we create our own thread-safe way of constructing these functions. Another reason is so we can construct numeric types from int128/uint128 types efficiently. I would be very happy if we'd have these definitions available, even if they would change in a future PG version (that's what you sign up for as an extension author). And indeed as Robert already said, numeric seems to be the only type that has this kind of restriction on viewing the internal representation. [1]: https://github.com/duckdb/pg_duckdb/blob/main/include/pgduckdb/vendor/pg_numeric_c.hpp
Re: access numeric data in module
Tom- I understand that you are concerned about future maintenance costs vs benefits of this change. I hope that I can address those concerns. An important thing to note is that there are two different structures that represent numeric values: * NumericData is an opaque structure that is defined in numeric.c. It is this struct that is used to store values. The patch I submitted has this structure remain opaque and in numeric.c. Its internals are messy and subject to future changes. I agree that third parties should not have access to this. Of note is that the type Numeric is a typedef of NumericData*. * NumericVar is a user-friendly structure that already exists. It is this structure that I propose moving to numeric.h. There are functions that exist to convert it to and from NumericData. It is these functions that I propose giving access to. What the patch boils down to is the movement of NumericVar to numeric.h along with function declarations for the basic function to work with it and a few pre-processor declarations. I agree that there is the potential for future maintenance costs here. However, any future changes to NumericData would necessitate updating the code to convert to and from NumericVar regardless of the proposed changes. I think that this small increase in costs is outweighed by the benefits of allowing third parties to access this powerful datatype. As for the reason that I would like to make this change: I am the maintainer of the PL/Haskell extension. (It allows the use of Haskell code as a procedural language. https://github.com/ed-o-saurus/PLHaskell) In the extension, users can currently pass several Postgres types and also have the function return them. This is accomplished by using the functions and macros that convert between Datums and C data types. (For example DatumGetFloat8 and Float8GetDatum to handle double-precision floating point values) I would like to add support for the use of the numeric type to the extension. To this end, I would need to create a Haskell type that mirrors the Postgres numeric type. Passed Haskell values would be instantiated by reading the data from Postgres. Conversely, returned values would be converted to the Postgres type. Internally, users would be able to perform mathematical operations with the Haskell values like any other type. Currently, there is no way for a third-party extension to get needed information about numeric values or build new numeric values. The proposed changes would remedy this. An alternative approach would be to make available calls to read and create numeric data. However, as the NumericVar struct already exists, I feel that utilizing it is the more natural approach. What do you think? -Ed On Sat, Mar 1, 2025 at 3:32 PM Ed Behn wrote: > Tom- > I think that I can allay your concerns. Please give me a day or so to > put together my more complete thoughts on the matter. I'll be in touch. > > -Ed > > On Sat, Mar 1, 2025 at 11:33 AM Tom Lane wrote: > >> Ed Behn writes: >> >> There is actually no new code. Code is simply moved from numeric.c to >> >> numeric.h. >> >> I will absolutely not hold still for that. It would mean that any >> time we want to think about messing with the contents of numerics, >> we need to examine more or less the whole Postgres code base to see >> what else is poking into those structures. >> >> If we must do something like this, then a separate header >> "numeric_internal.h" or something like that would reduce the blast >> radius for changes. But IMO you still haven't made an acceptable case >> for deciding that these data structures aren't private to numeric.c. >> What behaviors do you actually need that aren't accessible via the >> existing exported functons? >> >> regards, tom lane >> >
Re: bug when apply fast default mechanism for adding new column over domain with default value
jian he writes: > So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef, > pg_attrdef related code. > and it works fine. I think the fundamental problem here is that StoreAttrDefault shouldn't be involved in this in the first place. It looks like somebody did that in the hopes of avoiding two updates of the new pg_attribute row (one to set atthasdef and the other to fill attmissingval), but it's just bad code structure. We should take that code out of StoreAttrDefault, not duplicate it, because the assumption that the missingval is identical to what goes into pg_attrdef is just wrong. (We could possibly get back down to one update by moving code in ATExecAddColumn so that we know before calling InsertPgAttributeTuples whether the column will have a non-null default, and then we could set atthasdef correctly in the originally-inserted tuple. That's an optimization though, not part of the basic bug fix.) Also, since the "if (RELKIND_HAS_STORAGE(relkind))" stanza in ATExecAddColumn is already calling expression_planner, we should be able to avoid doing that twice on the way to discovering whether the expression is a constant. I kind of feel that StoreAttrMissingVal belongs in heap.c; it's got about nothing to do with pg_attrdef. regards, tom lane
Re: Space missing from EXPLAIN output
On Fri, Feb 28, 2025 at 12:12 PM Thom Brown wrote: > Rebased and attached. Thanks, committed. Sorry for the mistake and thanks for the patch. -- Robert Haas EDB: http://www.enterprisedb.com
RE: SIMD optimization for list_sort
>> > I don't think "another extension might use it someday" makes a very strong >> > case, >> > particularly for something that requires a new dependency. >> >> The x86-simdsort library is an optional dependency in Postgres. Also the new >> list sort implementation which uses the x86-simdsort library does not impact >> any of the existing workflows in Postgres. >"Optional" and "Does not impact" are not great selling points to get >us to take a 1500 line patch. As we told you in November, list_sort >isn't critical for us. You need to start with the user and work >backwards to the technology. Don't pick a technology and try to sell >people on using it. Agree on starting from the user problem and work towards technology. As stated upthread, the problem being addressed here is optimizing pg_vector list_sort (which relies on postgres list_sort) to speed up HNSW index construction. The results show 7-10% improvements on vector workloads using the library. Given the same x86-simdsort library is intended to optimize 1) list_sort 2) tuple sort, it didn't make sense to duplicate the work to integrate it in pg_vector for list_sort, and then again in postgres for tuple-sort. We're trying to tackle list_sort and tuple_sort in separate patches using the same x86-simdsort library, with the goal to optimize both. Let us know if this approach is preferred and if the list_sort patch could be reviewed and any other tests we should do, or would you rather see tuple_sort benchmarks. - Akash Shankaran
Re: Replace IN VALUES with ANY in WHERE clauses during optimization
On 28.02.2025 14:48, Alena Rybakina wrote: Hi! On 21.02.2025 00:09, Alena Rybakina wrote: Hi! On 09.02.2025 18:38, Alexander Korotkov wrote: Also, aren't we too restrictive while requiring is_simple_values_sequence()? For instance, I believe cases like this (containing Var) could be transformed too. select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1))); I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied. I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence. I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think? Considering it again, I think we can't face problems like that because we don't work with join. I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation. -- Regards, Alena Rybakina Postgres Professional diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 08c4648f936..7c1b2f4cfe6 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1215,8 +1215,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) } /* - * The function traverses the tree looking for elements of type var. - * If it finds it, it returns true. + * The function traverses the tree looking for subqueries particularly + * elements of type RangeTblEntry. If it finds it, it returns true. */ static bool values_simplicity_check_walker(Node *node, void *ctx) @@ -1225,7 +1225,7 @@ values_simplicity_check_walker(Node *node, void *ctx) { return false; } - else if(IsA(node, Var)) + else if(IsA(node, RangeTblEntry)) return true; else if(IsA(node, Query)) return query_tree_walker((Query *) node, @@ -1304,6 +1304,16 @@ convert_VALUES_to_ANY(Query *query, Node *testexpr) value = eval_const_expressions(NULL, value); + if(IsA(value, Var)) + { + /* +* Upper-level vars will now be one level closer to their +* parent than before; in particular, anything that had been level 1 +* becomes level zero. + */ + IncrementVarSublevelsUp(value, -1, 1); + } + if (!IsA(value, Const)) have_param = true; else if (((Const *)value)->constisnull) diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 5138a00349c..63e0114f177 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3095,3 +3095,14 @@ SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3)); -> Values Scan on "*VALUES*" (5 rows) +EXPLAIN (COSTS OFF) +SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t1.ten), (1))); +QUERY PLAN +-- + Nested Loop + Join Filter: (t2.ten = ANY (ARRAY[t1.ten, 1])) + -> Seq Scan on onek t1 + -> Materialize + -> Seq Scan on onek t2 +(5 rows) + diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 480f8d7b852..088619f0ffc 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1357,3 +1357,6 @@ SELECT ten FROM onek t WHERE 1.0 IN ((VALUES (1), (3))::integer); EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3)); + +EXPLAIN (COSTS OFF) +SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t1.ten), (1))); From d000645cb3da4c932bccb98ab39be890aa96383f Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Sat, 1 Mar 2025 14:28:58 +0300 Subject: [PATCH 2/2] Add an implementation of the x IN VALUES to 'x = ANY [...]', a special case of the ScalarArrayOpExpr operation. It will simplify the query tree, eliminating the appearance of an unnecessary join. Since VALUES describes a relational table, and the value of such a list is a table row, the optimizer will likely face an underestimation problem due to the inability to estimate cardinality through MCV statistics. The cardinality evaluation mechanism can work with the array inclusion check operation. If the array is small enough (< 100 elements), it will perform a statistical evaluation element by element. We perform the transformation in the convert_ANY_sublink_to_join if VALUES RTE is proper and the transformation is convertible. The conversion is only possible for operations on scalar va
Re: [PATCH] Add get_bytes() and set_bytes() functions
On Fri, 24 Jan 2025 at 13:00, Aleksander Alekseev wrote: > > Thank you. Here is the corrected patch. > This looks pretty good to me. I have a just a couple of minor comments: * The loop in bytea_integer() can be written more simply as a "for" loop. Given that it's only a few lines of code, it might as well just be coded directly in each cast function, which avoids the need to go via a 64-bit integer for every case. In addition, it should use the BITS_PER_BYTE macro rather than "8". Doing that leads to code that's consistent with bittoint4() and bittoint8(). * In pg_proc.dat, it's not necessary to write "proleakproof => 'f'", because that's the default, and no other function does that. * I think it's worth using slightly more non-trivial doc examples, including positive and negative cases, to make the behaviour more obvious. * I also tweaked the regression tests a bit, and copied the existing test style which displays both the expected and actual results from each test. With those updates, I think this is ready for commit, which I'll do in a day or two, if there are no further comments. Regards, Dean From 5c0572a0930067f7244606384542d670cd1e4aa6 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Sat, 1 Mar 2025 10:49:24 + Subject: [PATCH v10] Allow casting between bytea and integer types. This allows smallint, integer, and bigint values to be cast to and from bytea. The bytea value is the two's complement representation of the integer, with the most significant byte first. For example: 1234::bytea -> \x04d2 (-1234)::bytea -> \xfb2e Author: Aleksander Alekseev Reviewed-by: Joel Jacobson Reviewed-by: Yugo Nagata Reviewed-by: Peter Eisentraut Reviewed-by: Michael Paquier Reviewed-by: Dean Rasheed Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com --- doc/src/sgml/func.sgml | 17 src/backend/utils/adt/varlena.c | 90 src/include/catalog/pg_cast.dat | 14 src/include/catalog/pg_proc.dat | 19 + src/test/regress/expected/opr_sanity.out | 3 + src/test/regress/expected/strings.out| 102 +++ src/test/regress/sql/strings.sql | 29 +++ 7 files changed, 274 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0e6c534965..6b15f167d0 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5035,6 +5035,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + In addition, it is possible to cast integral values to and from type + bytea. Casting an integer to bytea produces + 2, 4, or 8 bytes, depending on the width of the integer type. The result + is the two's complement representation of the integer, with the most + significant byte first. Some examples: + +1234::smallint::bytea \x04d2 +cast(1234 as bytea)\x04d2 +cast(-1234 as bytea) \xfb2e +'\x8000'::bytea::smallint -32768 +'\x8000'::bytea::integer 32768 + + Casting a bytea to an integer will raise an error if the + length of the bytea exceeds the width of the integer type. + + See also the aggregate function string_agg in and the large object functions diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index e455657170..d22648a7e4 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -4057,6 +4057,96 @@ bytea_sortsupport(PG_FUNCTION_ARGS) PG_RETURN_VOID(); } +/* Cast bytea -> int2 */ +Datum +bytea_int2(PG_FUNCTION_ARGS) +{ + bytea *v = PG_GETARG_BYTEA_PP(0); + int len = VARSIZE_ANY_EXHDR(v); + uint16 result; + + if (len > sizeof(result)) + ereport(ERROR, +errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), +errmsg("smallint out of range")); + + result = 0; + for (int i = 0; i < len; i++) + { + result <<= BITS_PER_BYTE; + result |= ((unsigned char *) VARDATA_ANY(v))[i]; + } + + PG_RETURN_INT16(result); +} + +/* Cast bytea -> int4 */ +Datum +bytea_int4(PG_FUNCTION_ARGS) +{ + bytea *v = PG_GETARG_BYTEA_PP(0); + int len = VARSIZE_ANY_EXHDR(v); + uint32 result; + + if (len > sizeof(result)) + ereport(ERROR, +errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), +errmsg("integer out of range")); + + result = 0; + for (int i = 0; i < len; i++) + { + result <<= BITS_PER_BYTE; + result |= ((unsigned char *) VARDATA_ANY(v))[i]; + } + + PG_RETURN_INT32(result); +} + +/* Cast bytea -> int8 */ +Datum +bytea_int8(PG_FUNCTION_ARGS) +{ + bytea *v = PG_GETARG_BYTEA_PP(0); + int len = VARSIZE_ANY_EXHDR(v); + uint64 result; + + if (len > sizeof(result)) + ereport(ERROR, +errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), +errmsg("bigint out of range")); + + result = 0; + for (int i = 0; i < len; i++) + { + result <<= BITS_PER_BYTE; + result |= ((unsigned char *) VARDATA_ANY(v))[i]; + } + + PG_RETURN_INT64(result); +} + +/* Cast int2 -> byte
Re: SQLFunctionCache and generic plans
Hi pá 28. 2. 2025 v 7:29 odesílatel Pavel Stehule napsal: > Hi > > čt 27. 2. 2025 v 21:45 odesílatel Tom Lane napsal: > >> Pavel Stehule writes: >> > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane napsal: >> >> So taken together, our results are all over the map, anywhere >> >> from 7% speedup to 7% slowdown. My usual rule of thumb is that >> >> > Where do you see 7% speedup? Few lines up you wrote 0.7% faster. >> >> Alexander got that on the fx4 case, according to his response a >> few messages ago [1]. It'd be good if someone else could reproduce >> that, because right now we have two "it's slower" results versus >> only one "it's faster". >> > > ok > > here is a profile from master > > 6.98% postgres postgres [.] hash_bytes >6.30% postgres postgres [.] palloc0 >3.57% postgres postgres [.] SearchCatCacheInternal >3.29% postgres postgres [.] AllocSetAlloc >2.65% postgres plpgsql.so [.] exec_stmts >2.55% postgres postgres [.] expression_tree_walker_impl >2.34% postgres postgres [.] _SPI_execute_plan >2.13% postgres postgres [.] CheckExprStillValid >2.02% postgres postgres [.] fmgr_info_cxt_security >1.89% postgres postgres [.] ExecInitFunc >1.51% postgres postgres [.] ExecInterpExpr >1.48% postgres postgres [.] ResourceOwnerForget >1.44% postgres postgres [.] AllocSetReset >1.35% postgres postgres [.] MemoryContextCreate >1.30% postgres plpgsql.so [.] plpgsql_exec_function >1.29% postgres libc.so.6 [.] __memcmp_sse2 >1.24% postgres postgres [.] MemoryContextDelete >1.13% postgres postgres [.] check_stack_depth >1.11% postgres postgres [.] AllocSetContextCreateInternal >1.09% postgres postgres [.] resolve_polymorphic_argtypes >1.08% postgres postgres [.] hash_search_with_hash_value >1.07% postgres postgres [.] standard_ExecutorStart >1.07% postgres postgres [.] ExprEvalPushStep >1.04% postgres postgres [.] ExecInitExprRec >0.95% postgres plpgsql.so [.] plpgsql_estate_setup >0.91% postgres postgres [.] ExecReadyInterpretedExp > > and from patched > >7.08% postgres postgres [.] hash_bytes >6.25% postgres postgres [.] palloc0 >3.52% postgres postgres [.] SearchCatCacheInternal >3.30% postgres postgres [.] AllocSetAlloc >2.39% postgres postgres [.] expression_tree_walker_impl >2.37% postgres plpgsql.so [.] exec_stmts >2.15% postgres postgres [.] _SPI_execute_plan >2.10% postgres postgres [.] CheckExprStillValid >1.94% postgres postgres [.] fmgr_info_cxt_security >1.71% postgres postgres [.] ExecInitFunc >1.41% postgres postgres [.] AllocSetReset >1.40% postgres postgres [.] ExecInterpExpr >1.38% postgres postgres [.] ExprEvalPushStep >1.34% postgres postgres [.] ResourceOwnerForget >1.31% postgres postgres [.] MemoryContextDelete >1.24% postgres libc.so.6 [.] __memcmp_sse2 >1.21% postgres postgres [.] MemoryContextCreate >1.18% postgres postgres [.] AllocSetContextCreateInternal >1.17% postgres postgres [.] hash_search_with_hash_value >1.13% postgres postgres [.] resolve_polymorphic_argtypes >1.13% postgres plpgsql.so [.] plpgsql_exec_function >1.03% postgres postgres [.] standard_ExecutorStart >0.98% postgres postgres [.] ExecInitExprRec >0.96% postgres postgres [.] check_stack_depth > > looks so there is only one significant differences > > ExprEvalPushStep 1.07 x 1.38% > > Regards > > Pavel > > compiled without assertions on gcc 15 with 02 > > vendor_id : GenuineIntel > cpu family : 6 > model : 42 > model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz > stepping : 7 > microcode : 0x2f > cpu MHz : 2691.102 > cache size : 6144 KB > > I tested the patches on another notebook with more recent cpu vendor_id : GenuineIntel cpu family : 6 model : 154 model name : 12th Gen Intel(R) Core(TM) i7-12700H stepping : 3 microcode : 0x436 cpu MHz : 400.000 cache size : 24576 KB And the difference are smaller - about 3% Regards Pavel > > > >> regards, tom lane >> >> [1] >> https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru >> >
Re: [PATCH] Add regression tests of ecpg command notice (error / warning)
On Fri, Feb 28, 2025 at 11:27 PM Fujii Masao wrote: > On 2025/02/28 9:24, Ryo Kanbayashi wrote: > > I have rewrote my patch on TAP test sttyle :) > > File for build are also updated. > > Thanks for updating the patch! Thanks for review:) > > +'tests': [ > + 't/001_ecpg_notice.pl', > + 't/002_ecpg_notice_informix.pl', > > Since neither test emits "notice" messages, shouldn't the test file > names be revised to reflect this? I replaced "notice" to "err_warn_msg" > Also, I'm unsure if it's ideal to place input files directly under > the "t" directory. I looked for similar TAP tests with input files, > but I couldn't find any examples to guide this decision... I couldn't too. So places are not changed. > +program_help_ok('ecpg'); > +program_version_ok('ecpg'); > +program_options_handling_ok('ecpg'); > +command_fails(['ecpg'], 'ecpg without arguments fails'); > > These checks seem unnecessary in 002 since they're already covered in 001. I reflected above. --- Great regards, Ryo Kanbayashi ecpg-notice-regress-patch-tap-ver-rebased.patch Description: Binary data
Re: RFC: Additional Directory for Extensions
Hi everyone, I have finally been able to test the patch in a Kubernetes environment with CloudNativePG, thanks to Niccolò Fei and Marco Nenciarini, who created a pilot patch for CloudNativePG ( https://github.com/cloudnative-pg/cloudnative-pg/pull/6546). In the meantime, Kubernetes is likely adding the ImageVolume feature starting from the upcoming version 1.33. I will write a blog post soon about how CloudNativePG will benefit from this feature. See https://github.com/kubernetes/enhancements/issues/4639. Although the steps are not easily reproducible by everyone, I can confirm that I successfully mounted a volume in the Postgres pod using a container image that includes only pgvector (with a size of 1.6MB - see https://github.com/EnterpriseDB/pgvector/blob/dev/5645/Dockerfile.cnpg). By setting: dynamic_library_path = '$libdir:/extensions/pgvector/lib' extension_control_path = '$system:/extensions/pgvector/share' I was able to run the following queries: postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector'; -[ RECORD 1 ]-+- name | vector default_version | 0.8.0 installed_version | comment | vector data type and ivfflat and hnsw access methods postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector'; -[ RECORD 1 ]-+- name | vector default_version | 0.8.0 installed_version | comment | vector data type and ivfflat and hnsw access methods I also successfully ran the following: postgres=# SELECT * FROM pg_extension_update_paths('vector'); By emptying the content of `extension_control_path`, the vector extension disappeared from the list. postgres=# SHOW extension_control_path ; extension_control_path $system (1 row) postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector'; name | default_version | installed_version | comment --+-+---+- (0 rows) postgres=# SELECT * FROM pg_available_extension_versions WHERE name = 'vector'; name | version | installed | superuser | trusted | relocatable | schema | requires | comment --+-+---+---+-+-++--+- (0 rows) In my opinion, the patch already helps a lot and does what I can reasonably expect from a first iteration of improvements in enabling immutable container images that ship a self-contained extension to be dynamically loaded and unloaded from a Postgres cluster in Kubernetes. From here, it is all about learning how to improve things with an exploratory mindset with future versions of Postgres and Kubernetes. It's a long journey but this is a fundamental step in the right direction. Let me know if there's more testing for me to do. The documentation looks clear to me. Thank you to everyone who contributed to this patch, from the initial discussions to the development phase. I sincerely hope this is included in Postgres 18. Ciao, Gabriele On Fri, 28 Feb 2025 at 16:36, Matheus Alcantara wrote: > Hi > > On Tue, Feb 25, 2025 at 5:29 PM Matheus Alcantara > wrote: > > Fixed on the attached v3. > > > After I've sent the v3 patch I noticed that the tests were failing on > windows. > The problem was on TAP test that was using ":" as a separator on > extension_control_path and also the path was not being escaped correctly > resulting in a wrong path configuration. > > Attached v4 with all these fixes. > > -- > Matheus Alcantara > -- Gabriele Bartolini VP, Chief Architect, Kubernetes enterprisedb.com
Re: Separate GUC for replication origins
On Thu, Feb 13, 2025 at 6:48 AM Masahiko Sawada wrote: > > Thank you for updating the patch! The patch looks mostly good to me. > + /* + * Prior to PostgreSQL 18, max_replication_slots was used to set the + * number of replication origins. For backward compatibility, -1 indicates + * to use the fallback value (max_replication_slots). + */ + if (max_replication_origin_sessions == -1) Shouldn't we let users use max_replication_origin_sessions for subscribers? Maintaining this mapping for a long time can create confusion such that some users will keep using max_replication_slots for origins, and others will start using max_replication_origin_sessions. Such a mapping would be useful if we were doing something like this in back-branches, but doing it in a major version appears to be more of a maintenance burden. -- With Regards, Amit Kapila.
Re: DOCS - Generated Column Replication Examples
On Mon, Feb 3, 2025 at 4:23 AM Peter Smith wrote: > > A recent commit [1] added a new section "29.6. Generated Column > Replication" to the documentation [2]. But, no "Examples" were > included. > > I've created this new thread to make a case for adding an "Examples" > subsection to that page. > > (the proposed examples patch is attached) > > == > > 1. Reasons AGAINST Adding Examples to this section (and why I disagree): > > 1a. The necessary information to describe the feature can already be > found in the text. > I still feel that the current text and example given on the page are sufficient for now. This is a new feature, and I think it is better to get some feedback from users. We can add more examples or information based on real user feedback. Also, we haven't got any other response on this thread yet. So, I think we can wait for a week or so more to see if anyone else also thinks that the additional examples will be useful; if not, then return this patch for now. -- With Regards, Amit Kapila.
Re: access numeric data in module
Upon further review, I've updated the patch. This avoids possible name conflicts with other functions. See attached. On Sat, Feb 8, 2025 at 3:24 PM Ed Behn wrote: > I've created a patch (attached) to implement the changes discussed below. > > This change moves the definition of the NumericVar structure to numeric.h. > Function definitions for functions used to work with NumericVar are also > moved to the header as are definitions of functions used to convert > NumericVar to Numeric. (Numeric is used to store numeric and decimal types.) > > All of this is so that third-party libraries can access numeric and > decimal values without having to access the opaque Numeric structure. > > There is actually no new code. Code is simply moved from numeric.c to > numeric.h. > > This is a patch against branch master. > > This successfully compiles and is tested with regression tests. > > Also attached is a code sample that uses the change. > > Please provide feedback. I'm planning to submit this for the March > commitfest. > >-Ed > > On Wed, Sep 18, 2024 at 9:50 AM Robert Haas wrote: > >> On Sat, Sep 14, 2024 at 2:10 PM Ed Behn wrote: >> > Was there a resolution of this? I'm wondering if it is worth it for >> me to submit a PR for the next commitfest. >> >> Well, it seems like what you want is different than what I want, and >> what Tom wants is different from both of us. I'd like there to be a >> way forward here but at the moment I'm not quite sure what it is. >> >> -- >> Robert Haas >> EDB: http://www.enterprisedb.com >> > diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 40dcbc7b671..ec5e4b5ee32 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -45,65 +45,11 @@ /* -- * Uncomment the following to enable compilation of dump_numeric() - * and dump_var() and to get a dump of any result produced by make_result(). + * and dump_var() and to get a dump of any result produced by numeric_make_result(). * -- #define NUMERIC_DEBUG */ - -/* -- - * Local data types - * - * Numeric values are represented in a base-NBASE floating point format. - * Each "digit" ranges from 0 to NBASE-1. The type NumericDigit is signed - * and wide enough to store a digit. We assume that NBASE*NBASE can fit in - * an int. Although the purely calculational routines could handle any even - * NBASE that's less than sqrt(INT_MAX), in practice we are only interested - * in NBASE a power of ten, so that I/O conversions and decimal rounding - * are easy. Also, it's actually more efficient if NBASE is rather less than - * sqrt(INT_MAX), so that there is "headroom" for mul_var and div_var to - * postpone processing carries. - * - * Values of NBASE other than 1 are considered of historical interest only - * and are no longer supported in any sense; no mechanism exists for the client - * to discover the base, so every client supporting binary mode expects the - * base-1 format. If you plan to change this, also note the numeric - * abbreviation code, which assumes NBASE=1. - * -- - */ - -#if 0 -#define NBASE 10 -#define HALF_NBASE 5 -#define DEC_DIGITS 1 /* decimal digits per NBASE digit */ -#define MUL_GUARD_DIGITS 4 /* these are measured in NBASE digits */ -#define DIV_GUARD_DIGITS 8 - -typedef signed char NumericDigit; -#endif - -#if 0 -#define NBASE 100 -#define HALF_NBASE 50 -#define DEC_DIGITS 2 /* decimal digits per NBASE digit */ -#define MUL_GUARD_DIGITS 3 /* these are measured in NBASE digits */ -#define DIV_GUARD_DIGITS 6 - -typedef signed char NumericDigit; -#endif - -#if 1 -#define NBASE 1 -#define HALF_NBASE 5000 -#define DEC_DIGITS 4 /* decimal digits per NBASE digit */ -#define MUL_GUARD_DIGITS 2 /* these are measured in NBASE digits */ -#define DIV_GUARD_DIGITS 4 - -typedef int16 NumericDigit; -#endif - -#define NBASE_SQR (NBASE * NBASE) - /* * The Numeric type as stored on disk. * @@ -252,75 +198,6 @@ struct NumericData | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \ : ((n)->choice.n_long.n_weight)) -/* - * Maximum weight of a stored Numeric value (based on the use of int16 for the - * weight in NumericLong). Note that intermediate values held in NumericVar - * and NumericSumAccum variables may have much larger weights. - */ -#define NUMERIC_WEIGHT_MAX PG_INT16_MAX - -/* -- - * NumericVar is the format we use for arithmetic. The digit-array part - * is the same as the NumericData storage format, but the header is more - * complex. - * - * The value represented by a NumericVar is determined by the sign, weight, - * ndigits, and digits[] array. If it is a "special" value (NaN or Inf) - * then only the sign field matters; ndigits should be zero, and the weight - * and dscale fields are ignored. - * - * Note: the first digit of a NumericVar's value is assumed to be multiplied - * by NBASE ** weight. Another way to say it is that there are weig
Re: Amcheck verification of GiST and GIN
On Fri, 28 Feb 2025 at 23:31, Mark Dilger wrote: > The only obvious definition of "wrong" for this is that gin index scans > return different result sets than table scans over the same data. Using your > much smaller reproducible test case, and adding rows like: Yeach, you are 100% right. Actually, along this thread, we have not spotted any GIN bugs yet, only GIN amcheck bugs. This turns out to be also an GIN amcheck bug: ``` DEBUG: comparing for offset 79 category 2 key attnum 1 DEBUG: comparing for offset 80 category 3 key attnum 1 DEBUG: comparing for offset 81 category 0 key attnum 2 LOG: index "ginidx" has wrong tuple order on entry tree page, block 2, offset 81, rightlink 4294967295 DEBUG: comparing for offset 82 category 0 key attnum 2 DEBUG: comparing for offset 100 category 0 key attnum 2 DEBUG: comparing for offset 101 category 2 key attnum 2 DEBUG: comparing for offset 102 category 3 key attnum 2 DEBUG: comparing for offset 103 category 0 key attnum 3 LOG: index "ginidx" has wrong tuple order on entry tree page, block 2, offset 103, rightlink 4294967295 DEBUG: comparing for offset 104 category 0 key attnum 3 DEBUG: comparing for offset 105 category 0 key attnum 3 ``` Turns out we compare page entries for different attributes in gin_check_parent_keys_consistency. Trivial fix attached (see v37-0004). I now simply compare current and prev attribute numbers. This revolves issue discovered by `v0-0001-Add-a-reproducible-test-case-for-verify_gin-error.patch.no_apply`. However, the stress test seems to still not pass. On my pc, it never ens, all processes are in DELETE waiting/UPDATE waiting state. I will take another look tomorrow. p.s. I am just about to send this message, while i discovered we now miss v34-0003-Add-gist_index_check-function-to-verify-GiST-ind.patch & v34-0005-Add-GiST-support-to-pg_amcheck.patch from this patch series ;( -- Best regards, Kirill Reshke v37-0004-Fix-for-gin_index_check.patch Description: Binary data v37-0003-Fix-wording-in-GIN-README.patch Description: Binary data v37-0002-Add-gin_index_check-to-verify-GIN-index.patch Description: Binary data v37-0001-Refactor-amcheck-internals-to-isolate-common-loc.patch Description: Binary data v37-0005-Add-gin-index-checking-test-for-jsonb-data.patch Description: Binary data v37-0006-Add-gin-to-the-create-index-concurrently-tap-tes.patch Description: Binary data v37-0007-Stress-test-verify_gin-using-pgbench.patch Description: Binary data
Re: access numeric data in module
Ed Behn writes: >> There is actually no new code. Code is simply moved from numeric.c to >> numeric.h. I will absolutely not hold still for that. It would mean that any time we want to think about messing with the contents of numerics, we need to examine more or less the whole Postgres code base to see what else is poking into those structures. If we must do something like this, then a separate header "numeric_internal.h" or something like that would reduce the blast radius for changes. But IMO you still haven't made an acceptable case for deciding that these data structures aren't private to numeric.c. What behaviors do you actually need that aren't accessible via the existing exported functons? regards, tom lane
Re: Statistics Import and Export
Hello Jeff, 26.02.2025 04:00, Jeff Davis wrote: I plan to commit the patches soon. It looks like 8f427187d broke pg_dump on Cygwin: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07 As far as I can see, it exits prematurely here: float reltuples = strtof(PQgetvalue(res, i, i_reltuples), NULL); because of: /* * Cygwin has a strtof() which is literally just (float)strtod(), which means * we get misrounding _and_ silent over/underflow. Using our wrapper doesn't * fix the misrounding but does fix the error checks, which cuts down on the * number of test variant files needed. */ #define HAVE_BUGGY_STRTOF 1 ... #ifdef HAVE_BUGGY_STRTOF extern float pg_strtof(const char *nptr, char **endptr); #define strtof(a,b) (pg_strtof((a),(b))) #endif and: float pg_strtof(const char *nptr, char **endptr) { ... if (errno) { /* On error, just return the error to the caller. */ return fresult; } else if ((*endptr == nptr) || isnan(fresult) || ... Best regards, Alexander Lakhin Neon (https://neon.tech)
Re: Trigger more frequent autovacuums of heavy insert tables
On Sat, Mar 01, 2025 at 08:57:52AM +0800, wenhui qiu wrote: > Based on the comments, the pcnt_unfrozen value could potentially be 0, > which would indicate that everything is frozen. Therefore, is it necessary > to handle the case where the value is 0.? How so? If it's 0, then the insert threshold calculation would produce autovacuum_vacuum_insert_threshold, just like for an empty table. That seems like the intended behavior to me. -- nathan
Re: [Patch] add new parameter to pg_replication_origin_session_setup
I noticed that the patch needs rebasing, so here is the rebased version. Hopefully it makes to the commitfest. Doruk Yılmaz From 74a74fd02bce786093c19a23bef9444d0b8ef41d Mon Sep 17 00:00:00 2001 From: Doruk Date: Thu, 15 Aug 2024 23:34:26 +0300 Subject: [PATCH v4] pg_replication_origin_session_setup: pid parameter Since the introduction of parallel apply workers (commit 216a784829c), the replorigin_session_setup() was extended to accept an extra parameter: pid. This process ID is used to inform that multiple processes are sharing the same replication origin to apply changes in parallel. The replorigin_session_setup function has a SQL user interface: pg_replication_origin_session_setup. This commit adds an optional parameter that passes the process ID to the internal function replorigin_session_setup. It allows multiple processes to use the same replication origin if you are using the replication functions. --- doc/src/sgml/func.sgml | 8 +++- src/backend/catalog/system_functions.sql | 9 - src/backend/replication/logical/origin.c | 4 +++- src/include/catalog/pg_proc.dat | 2 +- 4 files changed, 19 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 47370e581a..e50e689fb6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -29475,7 +29475,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset pg_replication_origin_session_setup -pg_replication_origin_session_setup ( node_name text ) +pg_replication_origin_session_setup ( node_name text , pid integer DEFAULT 0 ) void @@ -29483,6 +29483,12 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset origin, allowing replay progress to be tracked. Can only be used if no origin is currently selected. Use pg_replication_origin_session_reset to undo. +If multiple processes can safely use the same replication origin (for +example, parallel apply processes), the optional pid +parameter can be used to specify the process ID of the first process. +The first process must provide pid equals to +0 and the other processes that share the same +replication origin should provide the process ID of the first process. diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 86888cd..ebc4005 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -636,6 +636,13 @@ LANGUAGE INTERNAL CALLED ON NULL INPUT VOLATILE PARALLEL SAFE AS 'pg_stat_reset_slru'; +CREATE OR REPLACE FUNCTION + pg_replication_origin_session_setup(node_name text, pid integer DEFAULT 0) +RETURNS void +LANGUAGE INTERNAL +STRICT VOLATILE +AS 'pg_replication_origin_session_setup'; + -- -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather @@ -737,7 +744,7 @@ REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) FROM REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_reset() FROM public; -REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text) FROM public; +REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text, integer) FROM public; REVOKE EXECUTE ON FUNCTION pg_replication_origin_xact_reset() FROM public; diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c index 1b586cb1cf..9cbe1eec45 100644 --- a/src/backend/replication/logical/origin.c +++ b/src/backend/replication/logical/origin.c @@ -1355,12 +1355,14 @@ pg_replication_origin_session_setup(PG_FUNCTION_ARGS) { char *name; RepOriginId origin; + int pid; replorigin_check_prerequisites(true, false); name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0))); origin = replorigin_by_name(name, false); - replorigin_session_setup(origin, 0); + pid = PG_GETARG_INT32(1); + replorigin_session_setup(origin, pid); replorigin_session_origin = origin; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b37e8a6f88..ea118a0563 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12063,7 +12063,7 @@ { oid => '6006', descr => 'configure session to maintain replication progress tracking for the passed in origin', proname => 'pg_replication_origin_session_setup', provolatile => 'v', - proparallel => 'u', prorettype => 'void', proargtypes => 'text', + proparallel => 'u', prorettype => 'void', proargtypes => 'text int4', prosrc => 'pg_replication_origin_session_setup' }, { oid => '6007', descr => 'teardown configured replication progress tracking', -- 2.39.5
Re: making EXPLAIN extensible
On Fri, Feb 28, 2025 at 3:09 PM Thom Brown wrote: > "pg_overexplain"? I love this name! And the idea sounds like a natural > evolution, so +1. Thanks. I thought about things like pg_hyperexplain or pg_explain_debug, but in the end I didn't like any of them better than overexplain. :-) > One thing I am wondering is whether extensions should be required to > prefix their EXPLAIN option with the extension name to avoid > collisions. I considered that. One advantage of doing that is that you could support autoloading. Right now, you have to LOAD 'pg_overexplain' or put it in session_preload_libraries or shared_preload_libraries in order to use it. If you required people to type EXPLAIN (pg_overexplain.range_table) instead of just EXPLAIN (range_table), then you could react to not finding any such option by trying to autoload a .so with the part of the name before the dot. But you can probably see that this idea has a couple of pretty serious weaknesses: 1. It is much more verbose. I theorize that people will be unhappy about having to type EXPLAIN (pg_overexplain.range_table) rather than just EXPLAIN (range_table). One could try to address this by renaming the extension to something shorter, like just 'oe'. Having to type EXPLAIN (oe.range_table) wouldn't be nearly as annoying. However, this seems like a pretty clear case of letting the tail wag the dog. 2. autoloading could have security concerns. This is probably fixable, but we'd need to be sure that providing a new way to trigger loading a module didn't open up any security holes. > If two extensions happen to choose the same name, it won't be possible > to use both simultaneously. That's true. Of course, a lot depends on whether we end up with 3 or 5 or 8 EXPLAIN extensions or more like 30 or 50 or 80. In the former case, the people writing those extensions will probably mostly know about each other and can just use different names. In the latter case it's a problem. My guess is it's the former case. > In what order would the options be applied? Would it be deterministic, > or weighted within the extension's configuration, or based on the > order of the options in the list? I'm not entirely sure I know which question you're asking here. If you're asking what happens if two modules try to register the same EXPLAIN option name and then a user uses it, one of the registrations will win and the other will lose. I think the second one wins. As I say above, I assume we'll find a way to not try to do that. However, I think more likely you're asking: if you load pg_fingernailexplain and pg_toenailexplain and then do EXPLAIN (toenail, fingernail) SELECT ..., in what order will the options take effect? For the answer to that question, see the commit message for 0002. > Would explain extensions be capable of modifying pre-existing core > option output, or just append to output? The interfaces we have are really only going to work for appending. Modifying would be cool, but I think it's mostly impractical. We have a framework for emitting stuff into EXPLAIN output in a way that takes into account whether you're in text mode or json or yaml or whatever, and this patch just builds on that existing framework to allow you to make extra calls to those emit-some-output functions at useful places. As a result, the patch is small and simple. If we had an existing framework for modifying stuff, then we could perhaps provide suitable places to call those functions, too. But they don't exist, and it's not easy to see how they could be created. I think you would need some kind of major redesign of explain.c, and I don't know how to do that without making it bloated, slow, and unmaintainable. If somebody comes up with a way of allowing certain limited types of modifications to EXPLAIN output with small, elegant-looking code changes, and if those changes seem like useful things for an extension to want to do, I'm totally on board. But I currently don't have an idea like that. > Should there be a way of determining which lines are output by which > option? An extension may output similar output to core output, making > it difficult or impossible to discern which is which. I don't think this is really going to be a problem. > Does there need to be any security considerations so that things like > RLS don't inadvertently become leaky? It's possible that there may be some security considerations, and that's worth thinking about. However, RLS disclaims support for side-channel attacks, so it's already understood to be (very) leaky. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Licence preamble update
On Sat, Mar 1, 2025 at 9:20 AM Noah Misch wrote: > > On Fri, Feb 28, 2025 at 12:07:26PM -0500, Tom Lane wrote: > > > > PGCAC holds trademarks on both "PostgreSQL" and "Postgres". We've > > been given legal advice that both terms need to be in current use > > to preserve the trademarks. Which they are and have been, but the > > present wording in COPYRIGHT doesn't align with that. The website > > phrasing will be adjusted as well. > > I'm good with the change, then. > LGTM as well. -- With Regards, Amit Kapila.
Re: Adding NetBSD and OpenBSD to Postgres CI
On Sat, Mar 1, 2025 at 6:24 AM Nazir Bilal Yavuz wrote: > I think I found the problem, sd0j's fstype is not a swap. It worked like that: > > $ disklabel -E sd0 > $ umount /usr/obj > $ disklabel -E sd0 # prompts are: m -> j -> \n -> \n -> swap -> w -> q > $ disklabel -E sd0 > $ swapon /dev/sd0j # runs successfully Thanks! I just piped those characters in and it worked, shaving another couple of minutes off. The script is probably now about as short and sweet as it needs to be? And the times are pretty good. https://cirrus-ci.com/build/5275349266726912 From 68232b16632fcc9469017ec219308c798e0d5391 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 28 Feb 2025 22:19:07 +1300 Subject: [PATCH v2 1/2] ci: Use a RAM disk for NetBSD and OpenBSD. Put the RAM disk setup for all three *BSD CI tasks into a common script, replacing the old FreeBSD-specific one from commit 0265e5c1. This makes them run 3 times and a bit over 2 times faster, respectively. NetBSD and FreeBSD can use the same one-liner to mount tmpfs, but OpenBSD needs a GCP-image specific recipe that knows where to steal a disk partition to reserve swap space to mount mfs, because its tmpfs is deprecated and currently broken. The configured size is enough for our current tests but could potentially need future tweaks. Thanks to Bilal for the disklabel incantation. Reviewed-by: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CA%2BhUKGJJ-XrPhN%2BQA4ZUfYAAXcwOSDty9t0vE9Z8__AdacKnQg%40mail.gmail.com --- .cirrus.tasks.yml | 5 ++--- src/tools/ci/gcp_freebsd_repartition.sh | 26 - src/tools/ci/gcp_ram_disk.sh| 22 + 3 files changed, 24 insertions(+), 29 deletions(-) delete mode 100755 src/tools/ci/gcp_freebsd_repartition.sh create mode 100755 src/tools/ci/gcp_ram_disk.sh diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index 91b51142d2e..c5e7b743bfb 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -155,8 +155,7 @@ task: ccache_cache: folder: $CCACHE_DIR - # Work around performance issues due to 32KB block size - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh + setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh create_user_script: | pw useradd postgres chown -R postgres:postgres . @@ -276,7 +275,7 @@ task: ccache_cache: folder: $CCACHE_DIR - + setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh create_user_script: | useradd postgres chown -R postgres:users /home/postgres diff --git a/src/tools/ci/gcp_freebsd_repartition.sh b/src/tools/ci/gcp_freebsd_repartition.sh deleted file mode 100755 index 3adb8fb88ec..000 --- a/src/tools/ci/gcp_freebsd_repartition.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/sh - -set -e -set -x - -# fix backup partition table after resize -gpart recover da0 -gpart show da0 - -# delete and re-add swap partition with expanded size -swapoff -a -gpart delete -i 3 da0 -gpart add -t freebsd-swap -l swapfs -a 4096 da0 -gpart show da0 -swapon -a - -# create a file system on a memory disk backed by swap, to minimize I/O -mdconfig -a -t swap -s20g -u md1 -newfs -b 8192 -U /dev/md1 - -# migrate working directory -du -hs $CIRRUS_WORKING_DIR -mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig -mkdir $CIRRUS_WORKING_DIR -mount -o noatime /dev/md1 $CIRRUS_WORKING_DIR -cp -a $CIRRUS_WORKING_DIR.orig/ $CIRRUS_WORKING_DIR/ diff --git a/src/tools/ci/gcp_ram_disk.sh b/src/tools/ci/gcp_ram_disk.sh new file mode 100755 index 000..d48634512ac --- /dev/null +++ b/src/tools/ci/gcp_ram_disk.sh @@ -0,0 +1,22 @@ +#!/bin/sh +# Move working directory into a RAM disk for better performance. + +set -e +set -x + +mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig +mkdir $CIRRUS_WORKING_DIR + +case "`uname`" in + FreeBSD|NetBSD) +mount -t tmpfs tmpfs $CIRRUS_WORKING_DIR +;; + OpenBSD) +umount /dev/sd0j # unused /usr/obj partition +printf "m j\n\n\nswap\nw\nq\n" | disklabel -E sd0 +swapon /dev/sd0j +mount -t mfs -o rw,noatime,nodev,-s=800 swap $CIRRUS_WORKING_DIR +;; +esac + +cp -a $CIRRUS_WORKING_DIR.orig/. $CIRRUS_WORKING_DIR/ -- 2.39.5 From cf2f8301f30537d7448190f9a2d0889845b4d168 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 2 Mar 2025 01:08:02 +1300 Subject: [PATCH v2 2/2] ci: Enable NetBSD and OpenBSD by default. XXX Not saying we're quite ready to do this (are we?), but cfbot will test them for this CF entry --- .cirrus.tasks.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index c5e7b743bfb..2004610989d 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -218,7 +218,6 @@ task: task: depends_on: SanityCheck - trigger_type: manual env: # Below are experimentally derived to be a decent choice. -- 2.39.5
Re: bug when apply fast default mechanism for adding new column over domain with default value
On Sat, Mar 1, 2025 at 2:43 PM Tom Lane wrote: > > I do not believe that case should require a table rewrite. > Both the default and the check constraint are immutable, > so we ought to be able to apply the check once and then > use the default as the attmissingval. > > > Attach a patch to fix this issue by cause it to table rewrite. > > I see no attached patch, but in any case forcing a table rewrite > seems like the wrong direction... > forcing table rewrite would be an easier fix. but not forcing table write seems doable. \d pg_attrdef Table "pg_catalog.pg_attrdef" Column | Type | Collation | Nullable | Default -+--+---+--+- oid | oid | | not null | adrelid | oid | | not null | adnum | smallint | | not null | adbin | pg_node_tree | C | not null | Indexes: "pg_attrdef_oid_index" PRIMARY KEY, btree (oid) "pg_attrdef_adrelid_adnum_index" UNIQUE CONSTRAINT, btree (adrelid, adnum) pg_attrdef_adrelid_adnum_index means a column can only have one default expression adbin. if we store domain's default expression in pg_attrdef it may have error like: CREATE DOMAIN int_domain AS int DEFAULT 11; ALTER TABLE t2 ADD COLUMN b int_domain default 3; ERROR: duplicate key value violates unique constraint "pg_attrdef_adrelid_adnum_index" DETAIL: Key (adrelid, adnum)=(18102, 2) already exists. currently function StoreAttrDefault will 1. set pg_attribute.atthasdef 2. compute and set atthasmissing, attmissingval 3. insert an entry in pg_attrdef. but we only want 2. So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef, pg_attrdef related code. and it works fine. From 4f87b744b19f34b02817d252cc69666e33da2e18 Mon Sep 17 00:00:00 2001 From: jian he Date: Sat, 1 Mar 2025 22:01:21 +0800 Subject: [PATCH v1 1/1] apply fast default for adding new column over domain with default value discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kqgl+tnvox5lo...@mail.gmail.com --- src/backend/catalog/pg_attrdef.c | 108 + src/backend/commands/tablecmds.c | 26 - src/include/catalog/pg_attrdef.h | 3 + src/test/regress/expected/fast_default.out | 49 ++ src/test/regress/sql/fast_default.sql | 36 +++ 5 files changed, 220 insertions(+), 2 deletions(-) diff --git a/src/backend/catalog/pg_attrdef.c b/src/backend/catalog/pg_attrdef.c index 91dafc81021..3b47139f94f 100644 --- a/src/backend/catalog/pg_attrdef.c +++ b/src/backend/catalog/pg_attrdef.c @@ -31,6 +31,114 @@ #include "utils/syscache.h" + +/* + * XXX is this the righ place? + * Store a attmissingval datum value for column attnum of relation rel. + * + * This closely resembles StoreAttrDefault, like: + * Sets atthasmissing to true and adds attmissingval for the specified attnum. + * but with two differences: + * - Does not set pg_attribute.atthasdef to true. + * - Does not store the default expression in pg_attrdef. + */ +void +StoreAttrMissingVal(Relation rel, AttrNumber attnum, + Node *expr, bool is_internal, bool add_column_mode) +{ + Relation attrrel; + HeapTuple atttup; + Form_pg_attribute attStruct; + Form_pg_attribute defAttStruct; + + ExprState *exprState; + Expr *expr2 = (Expr *) expr; + EState *estate = NULL; + ExprContext *econtext; + Datum valuesAtt[Natts_pg_attribute] = {0}; + bool nullsAtt[Natts_pg_attribute] = {0}; + bool replacesAtt[Natts_pg_attribute] = {0}; + Datum missingval = (Datum) 0; + bool missingIsNull = true; + + /* + * Update the pg_attribute entry for the column to show that we store attmissingval + * on it. + */ + attrrel = table_open(AttributeRelationId, RowExclusiveLock); + atttup = SearchSysCacheCopy2(ATTNUM, + ObjectIdGetDatum(RelationGetRelid(rel)), + Int16GetDatum(attnum)); + if (!HeapTupleIsValid(atttup)) + elog(ERROR, "cache lookup failed for attribute %d of relation %u", + attnum, RelationGetRelid(rel)); + attStruct = (Form_pg_attribute) GETSTRUCT(atttup); + + /* + * since we *only* restore default expression evaluated value in pg_attribute.attmissingval + * for attribute attnum, not store default expression entry on pg_attrdef + * we can't set pg_attribute.atthasdef (Anum_pg_attribute_atthasdef) to true. That's the + * main difference with StoreAttrDefault. + */ + if (!attStruct->atthasdef && attStruct->attgenerated == '\0' && + rel->rd_rel->relkind == RELKIND_RELATION && add_column_mode) + { + expr2 = expression_planner(expr2); + estate = CreateExecutorState(); + exprState = ExecPrepareExpr(expr2, estate); + econtext = GetPerTupleExprContext(estate); + + missingval = ExecEvalExpr(exprState, econtext, + &missingIsNull); + + FreeExecutorState(estate); + + defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1); + + if (missingIsNull) + missingval = (Datum) 0; + else + { + /* make a one-element array of the value */ + miss
Re: Statistics Import and Export
Hello Tom, 01.03.2025 20:04, Tom Lane wrote: Alexander Lakhin writes: It looks like 8f427187d broke pg_dump on Cygwin: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07 Yeah, Andrew and I have been puzzling over that off-list. pg_dump is definitely exiting unceremoniously. As far as I can see, it exits prematurely here: float reltuples = strtof(PQgetvalue(res, i, i_reltuples), NULL); I was suspecting those float conversions as a likely cause, but what do you think is wrong exactly? I see nothing obviously buggy in pg_strtof(). From my understanding, pg_strtof () can't stand against endptr == NULL. I have changed that line to: char *tptr; float reltuples = strtof(PQgetvalue(res, i, i_reltuples), &tptr); and 002_compare_backups passed for me. Best regards, Alexander Lakhin Neon (https://neon.tech)
Re: Statistics Import and Export
Alexander Lakhin writes: > It looks like 8f427187d broke pg_dump on Cygwin: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07 Yeah, Andrew and I have been puzzling over that off-list. pg_dump is definitely exiting unceremoniously. > As far as I can see, it exits prematurely here: > float reltuples = strtof(PQgetvalue(res, i, > i_reltuples), NULL); I was suspecting those float conversions as a likely cause, but what do you think is wrong exactly? I see nothing obviously buggy in pg_strtof(). But I'm not sure it's worth running to ground. I don't love any of the portability-related hacks that 8f427187d made: the setlocale() call looks like something with an undesirably large blast radius, and pg_dump has never made use of strtof or f2s.c before. Sure, those *ought* to work, but they evidently don't work everywhere, and I don't especially want to expend more brain cells figuring out what's wrong here. I think we ought to cut our losses and store reltuples in string form, as Corey wanted to do originally. regards, tom lane
Re: Statistics Import and Export
> Can you expand on some of those cases? Certainly. I think one of the problems is that because this patch is solving a pg_upgrade issue, the focus is on the "dump and restore" scenarios. But pg_dump is used for much more than that, especially "dump and examine". Although pg_dump is meant to be a canonical, logical representation of your schema and data, the stats add a non-determinant element to that. Statistical sampling is random, so pg_dump output changes with each run. (yes, COPY can also change, but much less so, as I argue later). One use case is a program that is simply using pg_dump to verify that nothing has modified your table data (I'll use a single table for these examples, but obviously this applies to a whole database as well). So let's say we create a table and populate it at time X, then check back at a later time to verify things are still exactly as we left them. dropdb gregtest createdb gregtest pgbench gregtest -i 2> /dev/null pg_dump gregtest -t pgbench_accounts > a1 sleep 10 pg_dump gregtest -t pgbench_accounts > a2 diff a1 a2 | cut -c1-50 100078c100078 < 'histogram_bounds', '{2,964,1921,2917,3892,4935 --- > 'histogram_bounds', '{7,989,1990,2969,3973,4977 While COPY is not going to promise a particular output order, the order should not change except for manual things: insert, update, delete, truncate, vacuum full, cluster (off the top of my head). What should not change the output is a background process gathering some metadata. Or someone running a database-wide ANALYZE. Another use case is someone rolling out their schema to a QA box. All the table definitions and data are checked into a git repository, with a checksum. They want to roll it out, and then verify that everything is exactly as they expect it to be. Or the program is part of a test suite that does a sanity check that the database is in an exact known state before starting. (Our system catalogs are very difficult when reverse engineering objects. Thus, many programs rely on pg_dump to do the heavy lifting for them. Parsing the text file generated by pg_dump is much easier than trying to manipulate the system catalogs.) So let's say the process is to create a new database, load things into it, and then checksum the result. We can simulate that with pg_bench: dropdb qa1; dropdb qa2 createdb qa1; createdb qa2 pgbench qa1 -i 2>/dev/null pgbench qa2 -i 2>/dev/null pg_dump qa1 > dump1; pg_dump qa2 > dump2 $ md5sum dump1 39a2da5e51e8541e9a2c025c918bf463 dump1 This md5sum does not match our repo! It doesn't even match the other one: $ md5sum dump2 4a977657dfdf910cb66c875d29cfebf2 dump2 It's the stats, or course, which has added a dose of randomness that was not there before, and makes our checksums useless: $ diff dump1 dump2 | cut -c1-50 100172c100172 < 'histogram_bounds', '{1,979,1974,2952,3973,4900 --- > 'histogram_bounds', '{8,1017,2054,3034,4045,513 With --no-statistics, the diff shows no difference, and the md5sum is always the same. Just to be clear, I love this patch, and I love the fact that one of our major upgrade warts is finally getting fixed. I've tried fixing it myself a few times over the last decade or so, but lacked the skills to do so. :) So I am thrilled to have this finally done. I just don't think it should be enabled by default for everything using pg_dump. For the record, I would not strongly object to having stats on by default for binary dumps, although I would prefer them off. So why not just expect people to modify their programs to use --no-statistics for cases like this? That's certainly an option, but it's going to break a lot of existing things, and create branching code: old code: pg_dump mydb -f pg.dump new code: if pg_dump.version >= 18 pg_dump --no-statistics mydb -f pg.dump else pg_dump mydb -f pg.dump Also, anything trained to parse pg_dump output will have to learn about the new SELECT pg_restore_ calls with their multi-line formats (not 100% sure we don't have that anywhere, as things like "SELECT setval" and "SELECT set_config" are single line, but there may be existing things) Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support
Re: Statistics Import and Export
Alexander Lakhin writes: > 01.03.2025 20:04, Tom Lane wrote: >> I was suspecting those float conversions as a likely cause, but >> what do you think is wrong exactly? I see nothing obviously >> buggy in pg_strtof(). > From my understanding, pg_strtof () can't stand against endptr == NULL. D'oh! I'm blind as a bat today. > I have changed that line to: > char *tptr; > float reltuples = strtof(PQgetvalue(res, i, i_reltuples), > &tptr); > and 002_compare_backups passed for me. Cool, but surely the right fix is to make pg_strtof() adhere to the POSIX specification, so we don't have to learn this lesson again elsewhere. I'll go make it so. Independently of that, do we want to switch over to storing reltuples as a string instead of converting it? I still feel uncomfortable about the amount of baggage we added to pg_dump to avoid that. regards, tom lane
Re: Statistics Import and Export
On Sat, 2025-03-01 at 13:52 -0500, Greg Sabino Mullane wrote: > > Can you expand on some of those cases? > > Certainly. I think one of the problems is that because this patch is > solving a pg_upgrade issue, the focus is on the "dump and restore" > scenarios. But pg_dump is used for much more than that, especially > "dump and examine". Thank you for going through these examples. > I just don't think it should be enabled by default for everything > using pg_dump. For the record, I would not strongly object to having > stats on by default for binary dumps, although I would prefer them > off. I am open to that idea, I just want to get it right, because probably whatever the default is in 18 will stay that way. Also, we will need to think through the set of pg_dump options again. A lot of our tools seem to assume that "if it's the default, we don't need a way to ask for it explicitly", which makes it a lot harder to ever change the default and keep a coherent set of options. > So why not just expect people to modify their programs to use --no- > statistics for cases like this? That's certainly an option, but it's > going to break a lot of existing things, and create branching code: I suggest that we wait a bit to see what additional feedback we get early in beta. > Also, anything trained to parse pg_dump output will have to learn > about the new SELECT pg_restore_ calls with their multi-line formats > (not 100% sure we don't have that anywhere, as things like "SELECT > setval" and "SELECT set_config" are single line, but there may be > existing things) That's an interesting point. What tools are currrently trying to parse pg_dump output? Regards, Jeff Davis
Re: access numeric data in module
Tom- I think that I can allay your concerns. Please give me a day or so to put together my more complete thoughts on the matter. I'll be in touch. -Ed On Sat, Mar 1, 2025 at 11:33 AM Tom Lane wrote: > Ed Behn writes: > >> There is actually no new code. Code is simply moved from numeric.c to > >> numeric.h. > > I will absolutely not hold still for that. It would mean that any > time we want to think about messing with the contents of numerics, > we need to examine more or less the whole Postgres code base to see > what else is poking into those structures. > > If we must do something like this, then a separate header > "numeric_internal.h" or something like that would reduce the blast > radius for changes. But IMO you still haven't made an acceptable case > for deciding that these data structures aren't private to numeric.c. > What behaviors do you actually need that aren't accessible via the > existing exported functons? > > regards, tom lane >
Re: Statistics Import and Export
On Fri, 2025-02-28 at 14:56 -0600, Nathan Bossart wrote: > On Fri, Feb 28, 2025 at 12:54:03PM -0800, Jeff Davis wrote: > > (Aside: I assume everyone here agrees that pg_upgrade should > > transfer > > the stats by default.) > > That feels like a safe assumption to me... I'm curious what the use- > case > is for pg_upgrade's --no-statistics option. Mostly completeness and paranoia. I don't see a real use case. If we decide we don't need it, that's fine with me. Regards, Jeff Davis