Re: buffer refcount leak in foreign batch insert code
Michael Paquier писал 2023-04-25 04:30: On Mon, Apr 24, 2023 at 09:57:10AM +0900, Michael Paquier wrote: The attached is what I am finishing with, with a minimal regression test added to postgres_fdw. Two partitions are enough. Well, I have gone through that again this morning, and applied the fix down to 14. The buildfarm is digesting it fine. -- Michael Thank you. Sorry for the late response, was on vacation. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: [PATCH] Add native windows on arm64 support
On 19/01/2023 10:09, Niyas Sait wrote: On 17/01/2023 22:51, Andres Freund wrote: int main(void) @@ -1960,18 +1966,19 @@ int main(void) } ''' - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', - args: test_c_args) - # Use ARM CRC Extension unconditionally - cdata.set('USE_ARMV8_CRC32C', 1) - have_optimized_crc = true - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', - args: test_c_args + ['-march=armv8-a+crc']) - # Use ARM CRC Extension, with runtime check - cflags_crc += '-march=armv8-a+crc' - cdata.set('USE_ARMV8_CRC32C', false) - cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) - have_optimized_crc = true + if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', + args: test_c_args) Seems like it'd be easier to read if you don't re-indent this, but just have the cc.get_id() == 'msvc' part of this if/else-if. I've attached a new version (v8) to fix the above indentation issue. Could someone please help with the review ? -- NiyasFrom 108ad061caa15c7346d53e906794c4e971afbcc0 Mon Sep 17 00:00:00 2001 From: Niyas Sait Date: Fri, 16 Dec 2022 10:45:56 + Subject: [PATCH v8] Enable postgres native build for windows-arm64 platform - Add support for meson build - Add arm64 definition of spin_delay function - Exclude arm_acle.h import for MSVC --- doc/src/sgml/install-windows.sgml | 3 ++- meson.build | 11 +-- src/include/storage/s_lock.h | 20 ++-- src/port/pg_crc32c_armv8.c| 2 ++ src/tools/msvc/gendef.pl | 8 5 files changed, 35 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index cbc70a039c..2ecd5fcf38 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -352,7 +352,8 @@ $ENV{MSBFLAGS}="/m"; Special Considerations for 64-Bit Windows - PostgreSQL will only build for the x64 architecture on 64-bit Windows. + PostgreSQL will only build for the x64 and ARM64 architectures on 64-bit + Windows. diff --git a/meson.build b/meson.build index 096044628c..6cca212fae 100644 --- a/meson.build +++ b/meson.build @@ -2045,8 +2045,11 @@ int main(void) elif host_cpu == 'arm' or host_cpu == 'aarch64' prog = ''' +#ifdef _MSC_VER +#include +#else #include - +#endif int main(void) { unsigned int crc = 0; @@ -2060,7 +2063,11 @@ int main(void) } ''' - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', + if cc.get_id() == 'msvc' +cdata.set('USE_ARMV8_CRC32C', false) +cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) +have_optimized_crc = true + elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', args: test_c_args) # Use ARM CRC Extension unconditionally cdata.set('USE_ARMV8_CRC32C', 1) diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index c9fa84cc43..a7973eae49 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -707,15 +707,31 @@ typedef LONG slock_t; #define SPIN_DELAY() spin_delay() -/* If using Visual C++ on Win64, inline assembly is unavailable. - * Use a _mm_pause intrinsic instead of rep nop. +/* + * If using Visual C++ on Win64, inline assembly is unavailable. + * Use architecture specific intrinsics. */ #if defined(_WIN64) +/* + * For Arm64, use __isb intrinsic. See aarch64 inline assembly definition for details. + */ +#ifdef _M_ARM64 +static __forceinline void +spin_delay(void) +{ +/* Reference: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions */ + __isb(_ARM64_BARRIER_SY); +} +#else +/* + * For x64, use _mm_pause intrinsic instead of rep nop. + */ static __forceinline void spin_delay(void) { _mm_pause(); } +#endif #else static __forceinline void spin_delay(void) diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c index d8fae510cf..3d7eb748ff 100644 --- a/src/port/pg_crc32c_armv8.c +++ b/src/port/pg_crc32c_armv8.c @@ -14,7 +14,9 @@ */ #include "c.h" +#ifndef _MSC_VER #include +#endif #include "port/pg_crc32c.h" diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl index e7cbefcbc3..934dc17b96 100644 --- a/src/tools/msvc/gendef.pl +++ b/src/tools/msvc/gendef.pl @@ -120,9 +120,9 @@ sub writedef { my $isdata = $def->{$f} eq 'data'; - # Strip the leading underscore for win32, but not x64 + # Strip the leading underscore for win32, but not x64 and aarch64 $f =~ s/^_// - unless ($arch eq "x86_64"); + unless ($arch eq "x86_64" || $arch eq "aarch64"); # Emit
Re: min/max aggregation for jsonb
On 03.03.23 11:41, David Rowley wrote: On Fri, 3 Mar 2023 at 23:17, Daneel Yaitskov wrote: I wanted to use min/max aggregation functions for jsonb type and noticed there is no functions for this type, meanwhile string/array types are supported. It's not really clear to me how you'd want these to sort. If you just want to sort by what the output that you see from the type's output function then you might get what you need by casting to text. Is there a concern about implementing support for jsonb in min/max? I imagine a lack of any meaningful way of comparing two jsonb values to find out which is greater than the other is of some concern. We already have ordering operators and operator classes for jsonb, so sticking min/max aggregates around that should be pretty straightforward.
Re: Logging parallel worker draught
On 5/1/23 18:33, Robert Haas wrote: > Why not? It seems like something some people might want to log and > others not. Running the whole server at DEBUG1 to get this information > doesn't seem like a suitable answer. Since the statement is also logged, it could spam the log with huge queries, which might also be a reason to stop logging this kind of problems until the issue is fixed. I attached the patch without the guc anyway just in case. What I was wondering was whether we would be better off putting this into the statistics collector, vs. doing it via logging. Both approaches seem to have pros and cons. We tried to explore different options with my collegues in another thread [1]. I feel like both solutions are worthwhile, and would be helpful. I plan to take a look at the pg_stat_statement patch [2] next. Since it's my first patch, I elected to choose the easiest solution to implement first. I also proposed this because I think logging can help pinpoint a lot of problems at a minimal cost, since it is easy to setup and exploit for everyone, everywhere [1] https://www.postgresql.org/message-id/d657df20-c4bf-63f6-e74c-cb85a81d0...@dalibo.com [2] https://www.postgresql.org/message-id/flat/6acbe570-068e-bd8e-95d5-00c737b865e8%40gmail.com -- Benoit Lobréau Consultant http://dalibo.comFrom fc71ef40f33f94b0506a092fb5b3dcde6de6d60a Mon Sep 17 00:00:00 2001 From: benoit Date: Tue, 2 May 2023 10:08:00 +0200 Subject: [PATCH] Add logging for exceeded parallel worker slot limits Procude a log message when a backend attempts to spawn a parallel worker but fails due to insufficient worker slots. The shortage can stem from max_worker_processes, max_parallel_worker, or max_parallel_maintenance_workers. The log message can help database administrators and developers diagnose performance issues related to parallelism and optimize the configuration of the system accordingly. --- src/backend/access/transam/parallel.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index b26f2a64fb..c60d1bd739 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -630,6 +630,11 @@ LaunchParallelWorkers(ParallelContext *pcxt) pcxt->nknown_attached_workers = 0; } + if (pcxt->nworkers_launched < pcxt->nworkers_to_launch) + ereport(LOG, + (errmsg("Parallel Worker draught during statement execution: workers spawned %d, requested %d", + pcxt->nworkers_launched, pcxt->nworkers_to_launch))); + /* Restore previous memory context. */ MemoryContextSwitchTo(oldcontext); } -- 2.39.2
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
Michael Paquier writes: > On Sat, Apr 15, 2023 at 11:06:25AM +0900, Michael Paquier wrote: >> Thanks, I'll look at it. > > + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) || > +Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny)) > + COMPLETE_WITH("CREATE", "GRANT"); > + else if (Matches("CREATE", "SCHEMA", MatchAny)) > + COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT"); > > I had this grammar under my eyes a few days ago for a different patch, > and there are much more objects types that can be appended to a CREATE > SCHEMA, like triggers, sequences, tables or views, so this is > incomplete, isn't it? This is for completing the word CREATE itself after CREATE SCHEMA [[] AUTHORIZATION] . The things that can come after that are already handled generically earlier in the function: /* CREATE */ /* complete with something you can create */ else if (TailMatches("CREATE")) matches = rl_completion_matches(text, create_command_generator); create_command_generator uses the words_after_create array, which lists all the things that can be created. - ilmari
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On 2023-Apr-07, Julien Rouhaud wrote: > That being said, I have a hard time believing that we could actually preserve > physical replication slots. I don't think that pg_upgrade final state is > fully > reproducible: not all object oids are preserved, and the various pg_restore > are run in parallel so you're very likely to end up with small physical > differences that would be incompatible with physical replication. Even if we > could make it totally reproducible, it would probably be at the cost of making > pg_upgrade orders of magnitude slower. And since many people are already > complaining that it's too slow, that doesn't seem like something we would > want. A point on preserving physical replication slots: because we change WAL format from one major version to the next (adding new messages or changing format for other messages), we can't currently rely on physical slots working across different major versions. So IMO, for now don't bother with physical replication slot preservation, but do keep the option name as specific to logical slots. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Logging parallel worker draught
On Mon, May 1, 2023 at 10:03 PM Robert Haas wrote: > > On Sat, Apr 22, 2023 at 7:06 AM Amit Kapila wrote: > > I don't think introducing a GUC for this is a good idea. We can > > directly output this message in the server log either at LOG or DEBUG1 > > level. > > Why not? It seems like something some people might want to log and > others not. Running the whole server at DEBUG1 to get this information > doesn't seem like a suitable answer. > We can output this at the LOG level to avoid running the server at DEBUG1 level. There are a few other cases where we are not able to spawn the worker or process and those are logged at the LOG level. For example, "could not fork autovacuum launcher process .." or "too many background workers". So, not sure, if this should get a separate treatment. If we fear this can happen frequently enough that it can spam the LOG then a GUC may be worthwhile. > What I was wondering was whether we would be better off putting this > into the statistics collector, vs. doing it via logging. Both > approaches seem to have pros and cons. > I think it could be easier for users to process the information if it is available via some view, so there is a benefit in putting this into the stats subsystem. -- With Regards, Amit Kapila.
Re: Add two missing tests in 035_standby_logical_decoding.pl
Hi, On 5/2/23 8:28 AM, Amit Kapila wrote: On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand wrote: I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other machines are not complaining). It does fail on "invalidated logical slots do not lead to retaining WAL", see https://cirrus-ci.com/task/4518083541336064 I'm not sure why it is failing, any idea? I think the reason for the failure is that on standby, the test is not able to remove the file corresponding to the invalid slot. You are using pg_switch_wal() to generate a switch record and I think you need one more WAL-generating statement after that to achieve your purpose which is that during checkpoint, the tes removes the WAL file corresponding to an invalid slot. Just doing checkpoint on primary may not serve the need as that doesn't lead to any new insertion of WAL on standby. Is your v6 failing in the same environment? Thanks for the feedback! No V6 was working fine. If not, then it is probably due to the reason that the test is doing insert after pg_switch_wal() in that version. Why did you change the order of insert in v7? I thought doing the insert before the switch was ok and as my local test was running fine I did not re-consider the ordering. BTW, you can confirm the failure by changing the DEBUG2 message in RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may not remove the WAL file corresponding to an invalid slot whereas it will remove the WAL file when the test succeeds. Yeah, I added more debug information and what I can see is that the WAL file we want to see removed is "00010003" while the standby emits: " 2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG: attempting to remove WAL segments older than log file 0002 2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG: recycled write-ahead log file "00010002" " As per your suggestion, changing the insert ordering (like in V8 attached) makes it now work on the failing environment too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom defaeb000b09eab9ba7b5d08c032f81b5bd72a53 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 28 Apr 2023 07:27:20 + Subject: [PATCH v8] Add a test to verify that invalidated logical slots do not lead to retaining WAL. --- .../t/035_standby_logical_decoding.pl | 39 ++- 1 file changed, 37 insertions(+), 2 deletions(-) 100.0% src/test/recovery/t/ diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index 66d264f230..a6b640d7fd 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -500,9 +500,44 @@ $node_standby->restart; check_slots_conflicting_status(1); ## -# Verify that invalidated logical slots do not lead to retaining WAL +# Verify that invalidated logical slots do not lead to retaining WAL. ## -# X TODO + +# Before removing WAL file(s), ensure the cascading standby catch up +$node_standby->wait_for_replay_catchup($node_cascading_standby, + $node_primary); + +# Get the restart_lsn from an invalidated slot +my $restart_lsn = $node_standby->safe_psql('postgres', + "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 'vacuum_full_activeslot' and conflicting is true;" +); + +chomp($restart_lsn); + +# As pg_walfile_name() can not be executed on the standby, +# get the WAL file name associated to this lsn from the primary +my $walfile_name = $node_primary->safe_psql('postgres', + "SELECT pg_walfile_name('$restart_lsn')"); + +chomp($walfile_name); + +# Generate some activity and switch WAL file on the primary +$node_primary->safe_psql( + 'postgres', "create table retain_test(a int); +select pg_switch_wal(); +insert into retain_test values(1); + checkpoint;"); + +# Wait for the standby to catch up +$node_primary->wait_for_catchup($node_standby); + +# Request a checkpoint on the standby to trigger the WAL file(s) removal +$node_standby->safe_psql('postgres', 'checkpoint;'); + +# Verify that the wal file has not been retained on the standby +my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name; +ok(!-f "$standby_walfile", + "invalidated logical slots do not lead to retaining WAL"); ## # Recovery conflict: Invalidate conflicting slots, including in-use slots -- 2.34.1
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Hi, On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote: > On 2023-Apr-07, Julien Rouhaud wrote: > > > That being said, I have a hard time believing that we could actually > > preserve > > physical replication slots. I don't think that pg_upgrade final state is > > fully > > reproducible: not all object oids are preserved, and the various pg_restore > > are run in parallel so you're very likely to end up with small physical > > differences that would be incompatible with physical replication. Even if > > we > > could make it totally reproducible, it would probably be at the cost of > > making > > pg_upgrade orders of magnitude slower. And since many people are already > > complaining that it's too slow, that doesn't seem like something we would > > want. > > A point on preserving physical replication slots: because we change WAL > format from one major version to the next (adding new messages or > changing format for other messages), we can't currently rely on physical > slots working across different major versions. I don't think anyone suggested to do physical replication over different major versions. My understanding was that it would be used to pg_upgrade a "physical cluster" (e.g. a primary and physical standby server) at the same time, and then simply starting them up again would lead to a working physical replication on the new version. I guess one could try to keep using the slots for other needs (PITR backup with pg_receivewal or something similar), and then you would indeed have to be aware that you won't be able to do anything with the new WAL records until you do a fresh base backup, but that's a problem that you can already face after a normal pg_upgrade (although in most cases it's probably quite obvious for now as the timeline isn't preserved).
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Tue, 2 May 2023, 19:43 Julien Rouhaud, wrote: > Hi, > > On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote: > > On 2023-Apr-07, Julien Rouhaud wrote: > > > > > That being said, I have a hard time believing that we could actually > preserve > > > physical replication slots. I don't think that pg_upgrade final state > is fully > > > reproducible: not all object oids are preserved, and the various > pg_restore > > > are run in parallel so you're very likely to end up with small physical > > > differences that would be incompatible with physical replication. > Even if we > > > could make it totally reproducible, it would probably be at the cost > of making > > > pg_upgrade orders of magnitude slower. And since many people are > already > > > complaining that it's too slow, that doesn't seem like something we > would want. > > > > A point on preserving physical replication slots: because we change WAL > > format from one major version to the next (adding new messages or > > changing format for other messages), we can't currently rely on physical > > slots working across different major versions. > > I don't think anyone suggested to do physical replication over different > major > versions. My understanding was that it would be used to pg_upgrade a > "physical cluster" (e.g. a primary and physical standby server) at the same > time, and then simply starting them up again would lead to a working > physical > replication on the new version. > > I guess one could try to keep using the slots for other needs (PITR backup > with > pg_receivewal or something similar), and then you would indeed have to be > aware > that you won't be able to do anything with the new WAL records until you > do a > fresh base backup, but that's a problem that you can already face after a > normal pg_upgrade (although in most cases it's probably quite obvious for > now > as the timeline isn't preserved). > if what you meant is that the slot may have to send a record generated by an older major version, then unless I'm missing something the same restriction could be added to such a feature as what's being discussed in this thread for the logical replication slots. so only a final shutdown checkpoint record would be present after the flushed WAL position. it may be possible to work around that, if there weren't all the other problems I mentioned. >
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
Dagfinn Ilmari Mannsåker writes: > Michael Paquier writes: > >> On Sat, Apr 15, 2023 at 11:06:25AM +0900, Michael Paquier wrote: >>> Thanks, I'll look at it. >> >> + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) || >> +Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", >> MatchAny)) >> + COMPLETE_WITH("CREATE", "GRANT"); >> + else if (Matches("CREATE", "SCHEMA", MatchAny)) >> + COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT"); >> >> I had this grammar under my eyes a few days ago for a different patch, >> and there are much more objects types that can be appended to a CREATE >> SCHEMA, like triggers, sequences, tables or views, so this is >> incomplete, isn't it? > > This is for completing the word CREATE itself after CREATE SCHEMA > [[] AUTHORIZATION] . The things that can come after that > are already handled generically earlier in the function: > > /* CREATE */ > /* complete with something you can create */ > else if (TailMatches("CREATE")) > matches = rl_completion_matches(text, create_command_generator); > > create_command_generator uses the words_after_create array, which lists > all the things that can be created. But, looking closer at the docs, only tables, views, indexes, sequences and triggers can be created as part of a CREATE SCHEMA statement. Maybe we should add a HeadMatches("CREATE", "SCHEMA") exception in the above? - ilmari
Re: Add PQsendSyncMessage() to libpq
On Mon, May 1, 2023 at 8:42 PM Michael Paquier wrote: > Another thing that may matter in terms of extensibility? Would a > boolean argument really be the best design? Could it be better to > have instead one API with a bits32 and some flags controlling its > internals? I wondered that, too. If we never add any more Boolean parameters to this function then that would end up a waste, but maybe we will and then it will be genius. Not sure what's best. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ICU locale validation / canonicalization
On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote: > On 30.03.23 04:33, Jeff Davis wrote: > >Attached is a new version of the final patch, which performs > >canonicalization. I'm not 100% sure that it's wanted, but it still > >seems like a good idea to get the locales into a standard format in the > >catalogs, and if a lot more people start using ICU in v16 (because it's > >the default), then it would be a good time to do it. But perhaps there > >are risks? > > I say, let's do it. The following is not cause for postgresql.git changes at this time, but I'm sharing it in case it saves someone else the study effort. Commit ea1db8a ("Canonicalize ICU locale names to language tags.") slowed buildfarm member hoverfly, but that disappears if I drop debug_parallel_query from its config. Typical end-to-end duration rose from 2h5m to 2h55m. Most-affected were installcheck runs, which rose from 11m to 19m. (The "check" stage uses NO_LOCALE=1, so it changed less.) From profiles, my theory is that each of the many parallel workers burns notable CPU and I/O opening its ICU collator for the first time. debug_parallel_query, by design, pursues parallelism independent of cost, so this is working as intended. If it ever matters in non-debug configurations, we might raise the default parallel_setup_cost or pre-load ICU collators in the postmaster.
Cleaning up array_in()
This is in response to Alexander's observation at [1], but I'm starting a fresh thread to keep this patch separate from the plperl fixes in the cfbot's eyes. Alexander Lakhin writes: > I continue watching the array handling bugs dancing Sirtaki too. Now it's > another asymmetry: > select '{{1},{{2}}}'::int[]; > {{{1}},{{2}}} > but: > select '{{{1}},{2}}'::int[]; > {} Bleah. Both of those should be rejected, for sure, but it's the same situation as in the PLs: we weren't doing anything to enforce that all the scalar elements appear at the same nesting depth. I spent some time examining array_in(), and was pretty disheartened by what a mess it is. It looks like back in the dim mists of the Berkeley era, there was an intentional attempt to allow non-rectangular array input, with the missing elements automatically filled out as NULLs. Since that was undocumented, we concluded it was a bug and plastered on some code to check for rectangularity of the input. I don't quibble with enforcing rectangularity, but the underlying logic should have been simplified while we were at it. The element-counting logic was basically magic (why is it okay to increment temp[ndim - 1] when the current nest_level might be different from that?) and the extra layers of checks didn't make it any more intelligible. Plus, ReadArrayStr was expending far more cycles than it needs to given the assumption of rectangularity. So, here's a rewrite. Although I view this as a bug fix, AFAICT the only effects are to accept input that should be rejected. So again I don't advocate back-patching. But should we sneak it into v16, or wait for v17? regards, tom lane [1] https://www.postgresql.org/message-id/9cd163da-d096-7e9e-28f6-f3620962a660%40gmail.com From 6a9fe8117e1b91958111c679d02a2bd7944fae22 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 1 May 2023 18:31:40 -0400 Subject: [PATCH v1 1/2] Simplify and speed up ReadArrayStr(). ReadArrayStr() seems to have been written on the assumption that non-rectangular input is fine and it should pad with NULLs anywhere that elements are missing. We disallowed non-rectangular input ages ago (commit 0e13d627b), but never simplified this function as a follow-up. In particular, the existing code recomputes each element's linear location from scratch, which is quite unnecessary for rectangular input: we can just assign the elements sequentially, saving lots of arithmetic. Add some more commentary while at it. (This leaves ArrayGetOffset0() unused, but I'm unsure whether to remove that.) --- src/backend/utils/adt/arrayfuncs.c | 69 ++ 1 file changed, 33 insertions(+), 36 deletions(-) diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 87c987fb27..39b5efc661 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -93,7 +93,7 @@ static bool array_isspace(char ch); static int ArrayCount(const char *str, int *dim, char typdelim, Node *escontext); static bool ReadArrayStr(char *arrayStr, const char *origStr, - int nitems, int ndim, int *dim, + int nitems, FmgrInfo *inputproc, Oid typioparam, int32 typmod, char typdelim, int typlen, bool typbyval, char typalign, @@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS) dataPtr = (Datum *) palloc(nitems * sizeof(Datum)); nullsPtr = (bool *) palloc(nitems * sizeof(bool)); if (!ReadArrayStr(p, string, - nitems, ndim, dim, + nitems, &my_extra->proc, typioparam, typmod, typdelim, typlen, typbyval, typalign, @@ -457,7 +457,8 @@ array_isspace(char ch) /* * ArrayCount - * Determines the dimensions for an array string. + * Determines the dimensions for an array string. This includes + * syntax-checking the array structure decoration (braces and delimiters). * * Returns number of dimensions as function result. The axis lengths are * returned in dim[], which must be of size MAXDIM. @@ -704,16 +705,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext) /* * ReadArrayStr : * parses the array string pointed to by "arrayStr" and converts the values - * to internal format. Unspecified elements are initialized to nulls. - * The array dimensions must already have been determined. + * to internal format. The array dimensions must have been determined, + * and the case of an empty array must have been handled earlier. * * Inputs: * arrayStr: the string to parse. * CAUTION: the contents of "arrayStr" will be modified! * origStr: the unmodified input string, used only in error messages. * nitems: total number of array elements, as already determined. - * ndim: number of array dimensions - * dim[]: array axis lengths * inputproc: type-specific input procedure for element datatype. * typioparam, typmod: auxiliary values to pass to inputproc. * typdelim: the value delimiter (type-specific).
code cleanup for CREATE STATISTICS
Hi, There is at present a comment at the top of transformStatsStmt which says "To avoid race conditions, it's important that this function relies only on the passed-in relid (and not on stmt->relation) to determine the target relation." However, doing what the comment says we need to do doesn't actually avoid the problem we need to avoid. The issue here, as I understand it, is that if we look up the same less-than-fully-qualified name multiple times, we might get different answers due to concurrent activity, and that might create a security vulnerability, along the lines of CVE-2014-0062. So what the code should be doing is looking up the user-provided name just once and then using that value throughout all subsequent processing stages, but it doesn't actually. The caller of transformStatsStmt() looks up the RangeVar and gets an OID, but that value is then discarded and CreateStatistics does another lookup on the same name, which means that we're not really avoiding the hazard about which the comment seems to be concerned. So that leads to the question of whether there's a security vulnerability here. I and a few other members of the pgsql-security haven't been able to find one in brief review, but we may have missed something. Fortunately, the permissions checks happen after the second name lookup inside CreateStatistics(), so it doesn't seem that, for example, you can leverage this to create extended statistics on a table that you don't own. You can possibly get the first part of the operation, where we transform the CREATE STATISTICS command's WHERE clause, to operate on one table that you do own and then the second part on another table that you don't own, but even if that's so, the second part is just going to fail before doing anything interesting, so it doesn't seem like there's a problem. If anyone reading this can spot an exploit, please speak up! So the attached patch is proposed as code cleanup, rather than security patches. It changes the code to avoid the duplicate name lookup altogether. There is no reason that I know of why this needs to be back-patched for correctness, but I think it's worth putting into master to make the code nicer and avoid doing things that in some circumstances can be risky. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Refactor-transformStatsStmt-to-avoid-repeated-nam.patch Description: Binary data
Re: "variable not found in subplan target list"
Hi Amit, On 2023-Mar-30, Alvaro Herrera wrote: > On 2023-Mar-29, Amit Langote wrote: > > Though, I wonder if we need to keep ec386948948 that introduced the > > notion of part_prune_index around if the project that needed it [1] > > has moved on to an entirely different approach altogether, one that > > doesn't require hacking up the pruning code. > > Hmm, that's indeed tempting. We have an open item about this, and I see no reason not to do it. I checked, and putting things back is just a matter of reverting 589bb816499e and ec386948948, cleaning up some trivial pgindent-induced conflicts, and bumping catversion once more. Would you like to do that yourself, or do you prefer that I do it? Ideally, we'd do it before beta1. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
On Mon, May 1, 2023 at 11:21 PM Peter Geoghegan wrote: > I can't imagine why you feel it necessary to communicate with me like > this. This is just vitriol, lacking any substance. John's email is pretty harsh, but I can understand why he's frustrated. I told you that I did not agree with your dislike for the term wraparound and I explained why. You sent a couple more emails telling me that I was wrong and, frankly, saying a lot of things that seem only tangentially related to the point that I was actually making. You seem to expect other people to spend a LOT OF TIME trying to understand what you're trying to say, but you don't seem to invest similar effort in trying to understand what they're trying to say. I couldn't even begin to grasp what your point was until Maciek stepped in to explain, and I still don't really agree with it, and I expect that no matter how many emails I write about that, your position won't budge an iota. It's really demoralizing. If I just vote -1 on the patch set, then I'm a useless obstruction. If I actually try to review it, we'll exchange 100 emails and I won't get anything else done for the next two weeks and I probably won't feel much better about the patch at the end of that process than at the beginning. I don't see that I have any winning options here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
On Tue, May 2, 2023 at 1:29 PM Robert Haas wrote: > I told you that I did not agree with your dislike for the term > wraparound and I explained why. You sent a couple more emails telling > me that I was wrong and, frankly, saying a lot of things that seem > only tangentially related to the point that I was actually making. I agree that that's what I did. You're perfectly entitled to find that annoying (though I maintain that my point about the 64-bit XID space was a good one, assuming the general subject matter was of interest). However, you're talking about this as if I dug my feet in on a substantive issue affecting the basic shape of the patch -- I don't believe that that conclusion is justified by anything I've said or done. I'm not even sure that we disagree on some less important point that will directly affect the patch (it's quite possible, but I'm not even sure of it). I've already said that I don't think that the term wraparound is going anywhere anytime soon (granted, that was on the other thread). So it's not like I'm attempting to banish all existing use of that terminology within the scope of this patch series -- far from it. At most I tried to avoid inventing new terms that contain the word "wraparound" (also on the other thread). The topic originally came up in the context of moving talk about physical wraparound to an entirely different chapter. Which is, I believe (based in part on previous discussions), something that all three of us already agree on! So again, I must ask: is there actually a substantive disagreement at all? > It's really demoralizing. If I just vote -1 on the patch set, then I'm > a useless obstruction. If I actually try to review it, we'll exchange > 100 emails and I won't get anything else done for the next two weeks > and I probably won't feel much better about the patch at the end of > that process than at the beginning. I don't see that I have any > winning options here. I've already put a huge amount of work into this. It is inherently a very difficult thing to get right -- it's not hard to understand why it was put off for so long. Why shouldn't I have opinions, given all that? I'm frustrated too. Despite all this, John basically agreed with my high level direction -- all of the important points seemed to have been settled without any arguments whatsoever (also based in part on previous discussions). John's volley of abuse seemed to come from nowhere at all. -- Peter Geoghegan
Rename 'lpp' to 'lp' in heapam.c
Hi, I just found the naming of the ItemId variables is not consistent in heapam.c. There are 13 'lpp's and 112 'lp's. Technically 'lpp' is correct as ItemId is a line pointer's pointer and there used to be code like "++lpp" for line pointer array iteration. Now that all the "++lpp" code has been removed and there are 100+ more occurrences of 'Ip' than 'lpp', I suggest we change 'lpp' to 'lp' to make things consistent and avoid confusion. Best Regards, Zian Wang v-1-0001-Rename-lpp-to-lp-in-heapam.c.patch Description: Binary data
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
On Tue, May 2, 2023 at 9:55 AM Peter Geoghegan wrote: > > On Mon, May 1, 2023 at 5:34 AM John Naylor wrote: > > 0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID part is mostly copy-pasted from the XID part, just to get something together. I'd like to abbreviate that somehow. > > Yeah, the need to abbreviate statements about MultiXact IDs by saying > that they work analogously to XIDs in some particular respect > is...another thing that makes this tricky. Then it sounds like they should stay separate. A direct copy-paste is not good for style, so I will add things like: - If for some reason autovacuum fails to clear old MXIDs from a table, the + As in the case with XIDs, it is possible for autovacuum to fail to [...] It might least be good for readability to gloss over the warning and only quote the MXID limit error message, but we'll have to see how it looks. -- John Naylor EDB: http://www.enterprisedb.com
Re: Rename 'lpp' to 'lp' in heapam.c
On Wed, 3 May 2023 at 12:16, Yaphters W wrote: > I just found the naming of the ItemId variables is not consistent in > heapam.c. There are 13 'lpp's and 112 'lp's. Technically 'lpp' is correct as > ItemId is a line pointer's pointer and there used to be code like "++lpp" for > line pointer array iteration. Now that all the "++lpp" code has been removed > and there are 100+ more occurrences of 'Ip' than 'lpp', I suggest we change > 'lpp' to 'lp' to make things consistent and avoid confusion. I don't really agree that one is any more correct than the other. I also don't think we should be making changes like this as doing this may give some false impression that we have some standard to follow here that a local variable of a given type must be given a certain name. To comply with such a standard seems like it would take close to an endless number of patches which would just result in wasted reviewer and committer time and give us nothing but pain while back patching. -1 from me. David
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
On Tue, May 2, 2023 at 6:46 PM John Naylor wrote: > It might least be good for readability to gloss over the warning and only > quote the MXID limit error message, but we'll have to see how it looks. Apparently you expect me to join you in pretending that you didn't lambast my review on this thread less than 24 hours ago [1]. I happen to believe that this particular patch is of great strategic importance, so I'll admit that I thought about it for a second. But just a second -- I have more self-respect than that. That's not the only reason, though. I also genuinely don't have the foggiest notion what was behind what you said. In particular, this part still makes zero sense to me: "Claim that others are holding you back, and then try to move the goalposts in their work" Let me get this straight: "Moving the goalposts of their work" refers to something *I* did to *you*, on *this* thread...right? If anything, I'm biased in *favor* of this patch. The fact that you seem to think that I was being obstructionist just doesn't make any sense to me, at all. I really don't know where to go from there. I'm not so much upset as baffled. [1] https://postgr.es/m/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8s...@mail.gmail.com -- Peter Geoghegan
Re: COPY TO STDOUT Apache Arrow support
Hi, There is also a new Arrow C library (one .h and one .c file) which makes it easier to use it from the postgresql codebase. https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ https://github.com/apache/arrow-nanoarrow/tree/main/dist Best regards, Adam Lippai On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai wrote: > Hi, > > There are two bigger developments in this topic: > >1. Pandas 2.0 is released and it can use Apache Arrow as a backend >2. Apache Arrow ADBC is released which standardizes the client API. >Currently it uses the postgresql wire protocol underneath > > Best regards, > Adam Lippai > > On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai wrote: > >> Hi, >> >> would it be possible to add Apache Arrow streaming format to the copy >> backend + frontend? >> The use case is fetching (or storing) tens or hundreds of millions of >> rows for client side data science purposes (Pandas, Apache Arrow compute >> kernels, Parquet conversion etc). It looks like the serialization overhead >> when using the postgresql wire format can be significant. >> >> Best regards, >> Adam Lippai >> >
Re: COPY TO STDOUT Apache Arrow support
Hi st 3. 5. 2023 v 5:15 odesílatel Adam Lippai napsal: > Hi, > > There is also a new Arrow C library (one .h and one .c file) which makes > it easier to use it from the postgresql codebase. > > https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ > https://github.com/apache/arrow-nanoarrow/tree/main/dist > > Best regards, > Adam Lippai > With 9fcdf2c787ac6da330165ea3cd50ec5155943a2b it can be implemented in extension Regards Pavel > On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai wrote: > >> Hi, >> >> There are two bigger developments in this topic: >> >>1. Pandas 2.0 is released and it can use Apache Arrow as a backend >>2. Apache Arrow ADBC is released which standardizes the client API. >>Currently it uses the postgresql wire protocol underneath >> >> Best regards, >> Adam Lippai >> >> On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai wrote: >> >>> Hi, >>> >>> would it be possible to add Apache Arrow streaming format to the copy >>> backend + frontend? >>> The use case is fetching (or storing) tens or hundreds of millions of >>> rows for client side data science purposes (Pandas, Apache Arrow compute >>> kernels, Parquet conversion etc). It looks like the serialization overhead >>> when using the postgresql wire format can be significant. >>> >>> Best regards, >>> Adam Lippai >>> >>
Re: Large files for relations
On Tue, May 2, 2023 at 3:28 PM Pavel Stehule wrote: > I like this patch - it can save some system sources - I am not sure how much, > because bigger tables usually use partitioning usually. Yeah, if you only use partitions of < 1GB it won't make a difference. Larger partitions are not uncommon, though. > Important note - this feature breaks sharing files on the backup side - so > before disabling 1GB sized files, this issue should be solved. Hmm, right, so there is a backup granularity continuum with "whole database cluster" at one end, "only files whose size, mtime [or optionally also checksum] changed since last backup" in the middle, and "only blocks that changed since LSN of last backup" at the other end. Getting closer to the right end of that continuum can make backups require less reading, less network transfer, less writing and/or less storage space depending on details. But this proposal moves the middle thing further to the left by changing the granularity from 1GB to whole relation, which can be gargantuan with this patch. Ultimately we need to be all the way at the right on that continuum, and there are clearly several people working on that goal. I'm not involved in any of those projects, but it's fun to think about an alien technology that produces complete standalone backups like rsync --link-dest (as opposed to "full" backups followed by a chain of "incremental" backups that depend on it so you need to retain them carefully) while still sharing disk blocks with older backups, and doing so with block granularity. TL;DW something something WAL something something copy_file_range().
Re: Large files for relations
On Wed, May 3, 2023 at 5:21 PM Thomas Munro wrote: > rsync --link-dest I wonder if rsync will grow a mode that can use copy_file_range() to share blocks with a reference file (= previous backup). Something like --copy-range-dest. That'd work for large-file relations (assuming a file system that has block sharing, like XFS and ZFS). You wouldn't get the "mtime is enough, I don't even need to read the bytes" optimisation, which I assume makes all database hackers feel a bit queasy anyway, but you'd get the space savings via the usual rolling checksum or a cheaper version that only looks for strong checksum matches at the same offset, or whatever other tricks rsync might have up its sleeve.
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Hi, On Mon, Apr 24, 2023 at 2:58 PM Peter Geoghegan wrote: > My work on page-level freezing for PostgreSQL 16 has some remaining > loose ends to tie up with the documentation. The "Routine Vacuuming" > section of the docs has no mention of page-level freezing. It also > doesn't mention the FPI optimization added by commit 1de58df4. This > isn't a small thing to leave out; I fully expect that the FPI > optimization will very significantly alter when and how VACUUM > freezes. The cadence will look quite a lot different. > > It seemed almost impossible to fit in discussion of page-level > freezing to the existing structure. In part this is because the > existing documentation emphasizes the worst case scenario, rather than > talking about freezing as a maintenance task that affects physical > heap pages in roughly the same way as pruning does. There isn't a > clean separation of things that would allow me to just add a paragraph > about the FPI thing. > > Obviously it's important that the system never enters xidStopLimit > mode -- not being able to allocate new XIDs is a huge problem. But it > seems unhelpful to define that as the only goal of freezing, or even > the main goal. To me this seems similar to defining the goal of > cleaning up bloat as avoiding completely running out of disk space; > while it may be "the single most important thing" in some general > sense, it isn't all that important in most individual cases. There are > many very bad things that will happen before that extreme worst case > is hit, which are far more likely to be the real source of pain. > > There are also very big structural problems with "Routine Vacuuming", > that I also propose to do something about. Honestly, it's a huge mess > at this point. It's nobody's fault in particular; there has been > accretion after accretion added, over many years. It is time to > finally bite the bullet and do some serious restructuring. I'm hoping > that I don't get too much push back on this, because it's already very > difficult work. > Thanks for taking the time to do this. It is indeed difficult work. I'll give my perspective as someone who has not read the vacuum code but have learnt most of what I know about autovacuum / vacuuming by reading the "Routine Vacuuming" page 10s of times. > > Attached patch series shows what I consider to be a much better > overall structure. To make this convenient to take a quick look at, I > also attach a prebuilt version of routine-vacuuming.html (not the only > page that I've changed, but the most important set of changes by far). > > This initial version is still quite lacking in overall polish, but I > believe that it gets the general structure right. That's what I'd like > to get feedback on right now: can I get agreement with me about the > general nature of the problem? Does this high level direction seem > like the right one? > There are things I like about the changes you've proposed and some where I feel that the previous section was easier to understand. I'll comment inline on the summary below and will put in a few points about things I think can be improved at the end. > > The following list is a summary of the major changes that I propose: > > 1. Restructures the order of items to match the actual processing > order within VACUUM (and ANALYZE), rather than jumping from VACUUM to > ANALYZE and then back to VACUUM. > > This flows a lot better, which helps with later items that deal with > freezing/wraparound. > +1 > > 2. Renamed "Preventing Transaction ID Wraparound Failures" to > "Freezing to manage the transaction ID space". Now we talk about > wraparound as a subtopic of freezing, not vice-versa. (This is a > complete rewrite, as described by later items in this list). > +1 on this too. Freezing is a normal part of vacuuming and while the aggressive vacuums are different, I think just talking about the worst case scenario while referring to it is alarmist. > > 3. All of the stuff about modulo-2^32 arithmetic is moved to the > storage chapter, where we describe the heap tuple header format. > > It seems crazy to me that the second sentence in our discussion of > wraparound/freezing is still: > > "But since transaction IDs have limited size (32 bits) a cluster that > runs for a long time (more than 4 billion transactions) would suffer > transaction ID wraparound: the XID counter wraps around to zero, and > all of a sudden transactions that were in the past appear to be in the > future" > > Here we start the whole discussion of wraparound (a particularly > delicate topic) by describing how VACUUM used to work 20 years ago, > before the invention of freezing. That was the last time that a > PostgreSQL cluster could run for 4 billion XIDs without freezing. The > invariant is that we activate xidStopLimit mode protections to avoid a > "distance" between any two unfrozen XIDs that exceeds about 2 billion > XIDs. So why on earth are we talking about 4 billion XIDs? This is the > most c