Re: Syncrep and improving latency due to WAL throttling

2023-01-30 Thread Bharath Rupireddy
On Sat, Jan 28, 2023 at 6:06 AM Tomas Vondra
 wrote:
>
> >
> > That's not the sole goal, from my end: I'd like to avoid writing out +
> > flushing the WAL in too small chunks.  Imagine a few concurrent vacuums or
> > COPYs or such - if we're unlucky they'd each end up exceeding their 
> > "private"
> > limit close to each other, leading to a number of small writes of the
> > WAL. Which could end up increasing local commit latency / iops.
> >
> > If we instead decide to only ever flush up to something like
> >   last_page_boundary - 1/8 * throttle_pages * XLOG_BLCKSZ
> >
> > we'd make sure that the throttling mechanism won't cause a lot of small
> > writes.
> >
>
> I'm not saying we shouldn't do this, but I still don't see how this
> could make a measurable difference. At least assuming a sensible value
> of the throttling limit (say, more than 256kB per backend), and OLTP
> workload running concurrently. That means ~64 extra flushes/writes per
> 16MB segment (at most). Yeah, a particular page might get unlucky and be
> flushed by multiple backends, but the average still holds. Meanwhile,
> the OLTP transactions will generate (at least) an order of magnitude
> more flushes.

I think measuring the number of WAL flushes with and without this
feature that the postgres generates is great to know this feature
effects on IOPS. Probably it's even better with variations in
synchronous_commit_flush_wal_after or the throttling limits.

On Sat, Jan 28, 2023 at 6:08 AM Tomas Vondra
 wrote:
>
> On 1/27/23 22:19, Andres Freund wrote:
> > Hi,
> >
> > On 2023-01-27 12:06:49 +0100, Jakub Wartak wrote:
> >> On Thu, Jan 26, 2023 at 4:49 PM Andres Freund  wrote:
> >>
> >>> Huh? Why did you remove the GUC?
> >>
> >> After reading previous threads, my optimism level of getting it ever
> >> in shape of being widely accepted degraded significantly (mainly due
> >> to the discussion of wider category of 'WAL I/O throttling' especially
> >> in async case, RPO targets in async case and potentially calculating
> >> global bandwidth).
> >
> > I think it's quite reasonable to limit this to a smaller scope. Particularly
> > because those other goals are pretty vague but ambitious goals. IMO the
> > problem with a lot of the threads is precisely that that they aimed at a 
> > level
> > of generallity that isn't achievable in one step.
> >
>
> +1 to that

Okay, I agree to limit the scope first to synchronous replication.

Although v2 is a WIP patch, I have some comments:
1. Coding style doesn't confirm to the Postgres standard:
+/*
+ * Called from ProcessMessageInterrupts() to avoid waiting while
being in critical section
+ */
80-line char limit

+void HandleXLogDelayPending()
A new line missing between function return type and functin name

+elog(WARNING, "throttling WAL down on this session
(backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)",
+backendWalInserted,
+LSN_FORMAT_ARGS(XactLastThrottledRecEnd),
+LSN_FORMAT_ARGS(LastFullyWrittenXLogPage));
Indentation issue - space needed in the next lines after elog(WARNING,..

+elog(WARNING, "throttling WAL down on this session - end (%f
ms)", timediff);
80-line char limit, timediff); be on the new line.

+//RESUME_INTERRUPTS();
+//HOLD_INTERRUPTS();
Multi-line comments are used elsewhere in the code.

Better to run pgindent on the patch.

2. It'd be better to add a TAP test hitting the WAL throttling.

3. We generally don't need timings to be calculated explicitly when we
emit before and after log messages. If needed one can calculate the
wait time from timestamps of the log messages. If it still needs an
explicit time difference which I don't think is required, because we
have a special event and before/after log messages, I think it needs
to be under track_wal_io_timing to keep things simple.

4. XLogInsertRecord() can be a hot path for high-write workload, so
effects of adding anything in it needs to be measured. So, it's better
to run benchmarks with this feature enabled and disabled.

5. Missing documentation of this feature and the GUC. I think we can
describe extensively why we'd need a feature of this kind in the
documentation for better adoption or usability.

6. Shouldn't the max limit be MAX_KILOBYTES?
+&synchronous_commit_flush_wal_after,
+0, 0, 1024L*1024L,

7. Can this GUC name be something like
synchronous_commit_wal_throttle_threshold to better reflect what it
does for a backend?
+{"synchronous_commit_flush_wal_after", PGC_USERSET,
REPLICATION_SENDING,

8. A typo - s/confiration/confirmation
+gettext_noop("Sets the maximum logged WAL in kbytes,
after which wait for sync commit confiration even without commit "),

9. This
"Sets the maximum logged WAL in kbytes, after which wait for sync
commit confiration even without commit "
better be something like below?
"Sets the maximum amount of WAL in kilobytes a backend generates after
which it waits for synchronous commit confiration even without commit
"

1

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-30 Thread Thomas Munro
On Mon, Jan 30, 2023 at 6:36 PM Andres Freund  wrote:
> On 2023-01-30 15:22:34 +1300, Thomas Munro wrote:
> > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro  wrote:
> > > out-of-order hazard
> >
> > I've been trying to understand how that could happen, but my CPU-fu is
> > weak.  Let me try to write an argument for why it can't happen, so
> > that later I can look back at how stupid and naive I was.  We have A
> > B, and if the CPU sees no dependency and decides to execute B A
> > (pipelined), shouldn't an interrupt either wait for the whole
> > schemozzle to commit first (if not in a hurry), or nuke it, handle the
> > IPI and restart, or something?
>
> In a core local view, yes, I think so. But I don't think that's how it can
> work on multi-core, and even more so, multi-socket machines. Imagine how it'd
> influence latency if every interrupt on any CPU would prevent all out-of-order
> execution on any CPU.

Good.  Yeah, I was talking only about a single thread/core.

> > After an hour of reviewing randoma
> > slides from classes on out-of-order execution and reorder buffers and
> > the like, I think the term for making sure that interrupts run with
> > the illusion of in-order execution maintained is called "precise
> > interrupts", and it is expected in all modern architectures, after the
> > early OoO pioneers lost their minds trying to program without it.  I
> > guess generally you want that because it would otherwise run your
> > interrupt handler in a completely uncertain environment, and
> > specifically in this case it would reach our signal handler which
> > reads A's output (waiting) and writes to B's input (is_set), so B IPI
> > A surely shouldn't be allowed?
>
> Userspace signals aren't delivered synchronously during hardware interrupts
> afaik - and I don't think they even possibly could be (after all the process
> possibly isn't scheduled).

Yeah, they're not synchronous and the target might not even be
running.  BUT if a suitable thread is running then AFAICT an IPI is
delivered to that sucker to get it running the handler ASAP, at least
on the three OSes I looked at.  (See breadcrumbs below).

> I think what you're talking about with precise interrupts above is purely
> about the single-core view, and mostly about hardware interrupts for faults
> etc. The CPU will unwind state from speculatively executed code etc on
> interrupt, sure - but I think that's separate from guaranteeing that you can't
> have stale cache contents *due to work by another CPU*.

Yeah.  I get the cache problem, a separate issue that does indeed look
pretty dodgy.  I guess I wrote my email out-of-order: at the end I
speculated that cache coherency probably can't explain this failure at
least in THAT bit of the source, because of that funky extra
self-SetLatch().  I just got spooked by the mention of out-of-order
execution and I wanted to chase it down and straighten out my
understanding.

> I'm not even sure that userspace signals are generally delivered via an
> immediate hardware interrupt, or whether they're processed at the next
> scheduler tick. After all, we know that multiple signals are coalesced, which
> certainly isn't compatible with synchronous execution. But it could be that
> that just happens when the target of a signal is not currently scheduled.

FreeBSD: By default, they are when possible, eg if the process is
currently running a suitable thread.  You can set sysctl
kern.smp.forward_signal_enabled=0 to turn that off, and then it works
more like the way you imagined (checking for pending signals at
various arbitrary times, not sure).  See tdsigwakeup() ->
forward_signal() -> ipi_cpu().

Linux: Well it certainly smells approximately similar.  See
signal_wake_up_state() -> kick_process() -> smp_send_reschedule() ->
smp_cross_call() -> __ipi_send_mask().  The comment for kick_process()
explains that it's using the scheduler IPI to get signals handled
ASAP.

Darwin: ... -> cpu_signal() -> something that talks about IPIs

Coalescing is happening not only at the pending signal level (an
invention of the OS), and then for the inter-processor wakeups there
is also interrupt coalescing.  It's latches all the way down.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-30 Thread Amit Kapila
On Mon, Jan 30, 2023 at 12:38 PM Kyotaro Horiguchi
 wrote:
>
> At Mon, 30 Jan 2023 11:56:33 +0530, Amit Kapila  
> wrote in
> > >
> > > The GUC is not stored in a catalog, but.. oh... it is multiplied by
> > > 1000.
> >
> > Which part of the patch you are referring to here? Isn't the check in
>
> Where recovery_min_apply_delay is used. It is allowed to be set up to
> INT_MAX but it is used as:
>
> >   delayUntil = TimestampTzPlusMilliseconds(xtime, 
> > recovery_min_apply_delay);
>
> Where the macro is defined as:
>
> > #define TimestampTzPlusMilliseconds(tz,ms) ((tz) + ((ms) * (int64) 1000))
>
> Which can lead to overflow, which is practically harmless.
>

But here tz is always TimestampTz (which is int64), so do, we need to worry?

> > the function defGetMinApplyDelay() sufficient to ensure that the
> > 'delay' value stored in the catalog will always be lesser than
> > INT_MAX?
>
> I'm concerned about cases where INT_MAX is wider than int32.  If we
> don't assume such cases, I'm fine with INT_MAX there.
>

I am not aware of such cases. Anyway, if any such case is discovered
then we need to change the checks in defGetMinApplyDelay(), right? If
so, then I think it is better to keep it as it is unless we know that
this could be an issue on some platform.

-- 
With Regards,
Amit Kapila.




Re: Check lateral references within PHVs for memoize cache keys

2023-01-30 Thread Richard Guo
On Tue, Jan 24, 2023 at 10:07 AM David Rowley  wrote:

> I'm surprised to see that it's only Memoize that ever makes use of
> lateral_vars. I'd need a bit more time to process your patch, but one
> additional thought I had was that I wonder if the following code is
> still needed in nodeMemoize.c
>
> if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids))
> cache_purge_all(node);
>
> Ideally, that would be an Assert failure, but possibly we should
> probably still call cache_purge_all(node) after Assert(false) so that
> at least we'd not start returning wrong results if we've happened to
> miss other cache keys. I thought maybe something like:


Hmm, I think this code is still needed because the parameter contained
in the subplan below a Memoize node may come from parent plan, as in the
test query added in 411137a42.

EXPLAIN (COSTS OFF)
SELECT unique1 FROM tenk1 t0
WHERE unique1 < 3
  AND EXISTS (
SELECT 1 FROM tenk1 t1
INNER JOIN tenk1 t2 ON t1.unique1 = t2.hundred
WHERE t0.ten = t1.twenty AND t0.two <> t2.four OFFSET 0);
   QUERY PLAN

 Index Scan using tenk1_unique1 on tenk1 t0
   Index Cond: (unique1 < 3)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Nested Loop
   ->  Index Scan using tenk1_hundred on tenk1 t2
 Filter: (t0.two <> four)
   ->  Memoize
 Cache Key: t2.hundred
 Cache Mode: logical
 ->  Index Scan using tenk1_unique1 on tenk1 t1
   Index Cond: (unique1 = t2.hundred)
   Filter: (t0.ten = twenty)
(13 rows)

Currently we don't have a way to add Params of uplevel vars to Memoize
cache keys.  So I think we still need to call cache_purge_all() each
time uplevel Params change.

Thanks
Richard


Re: Deadlock between logrep apply worker and tablesync worker

2023-01-30 Thread vignesh C
On Mon, 30 Jan 2023 at 13:00, houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 30, 2023 2:32 PM Amit Kapila  
> wrote:
> >
> > On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
> > >
> > > On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
> > > >
> > > > One thing that looks a bit odd is that we will anyway have a similar
> > > > check in replorigin_drop_guts() which is a static function and
> > > > called from only one place, so, will it be required to check at both 
> > > > places?
> > >
> > > There is a possibility that the initial check to verify if replication
> > > origin exists in replorigin_drop_by_name was successful but later one
> > > of either table sync worker or apply worker process might have dropped
> > > the replication origin,
> > >
> >
> > Won't locking on the particular origin prevent concurrent drops? IIUC, the
> > drop happens after the patch acquires the lock on the origin.
>
> Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
> already lock the origin before that. I think the check in replorigin_drop_guts
> is a custom check after calling SearchSysCache1 to get the tuple, but the 
> error
> should not happen as no concurrent drop can be performed.

This scenario is possible while creating subscription, apply worker
will try to drop the replication origin if the state is
SUBREL_STATE_SYNCDONE. Table sync worker will set the state to
SUBREL_STATE_SYNCDONE and update the relation state before calling
replorigin_drop_by_name. Since the transaction is committed by table
sync worker, the state is visible to apply worker, now apply worker
will  parallelly try to drop the replication origin in this case.
There is a race condition in this case, one of the process table sync
worker or apply worker will acquire the lock and drop the replication
origin, the other process will get the lock after the process drops
the origin and commits the transaction. Now the other process will try
to drop the replication origin once it acquires the lock and get the
error(from replorigin_drop_guts): cache lookup failed for replication
origin with ID.
Concurrent drop is possible in this case.

Regards,
Vignesh




Re: Lazy JIT IR code generation to increase JIT speed with partitions

2023-01-30 Thread David Geier

Hi Dmitry,

On 1/27/23 16:18, Dmitry Dolgov wrote:

As I've noted off-list, there was noticeable difference in the dumped
bitcode, which I haven't noticed since we were talking mostly about
differences between executions of the same query.

Thanks for the clarification and also thanks for helping with this effort.

--
David Geier
(ServiceNow)





Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-30 Thread Ankit Kumar Pandey



> On 30/01/23 11:01, John Naylor wrote:

> Since David referred to L3 size as the starting point of a possible 
configuration parameter, that's actually cache-conscious.


Okay, makes sense. I am correcting error on my part.


> I'm not close enough to this thread to guess at the right direction 
(although I hope related work will help), but I have a couple general 
remarks:


> 1. In my experience, academic papers like to test sorting with 
register-sized numbers or strings. Our sorttuples are bigger, have 
complex comparators, and can fall back to fields in the full tuple.
> 2. That paper is over 20 years old. If it demonstrated something 
genuinely useful, some of those concepts would likely be implemented in 
the real-world somewhere. Looking for evidence of that might be a good 
exercise.

> 3. 20 year-old results may not carry over to modern hardware.
> 4. Open source software is more widespread in the academic world now 
than 20 years ago, so papers with code (maybe even the author's github) 
are much more useful to us in my view.
> 5. It's actually not terribly hard to make sorting faster for some 
specific cases -- the hard part is keeping other inputs of interest from 
regressing.

> 6. The bigger the change, the bigger the risk of regressing somewhere.

Thanks John, these inputs are actually what I was looking for. I will do 
more research based on these inputs and build up my understanding.



Regards,

Ankit





RE: Logical replication timeout problem

2023-01-30 Thread wangw.f...@fujitsu.com
On Mon, Jan 30, 2023 at 14:55 PM Amit Kapila  wrote:
> On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 
> wrote:
> > > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
> > >  wrote:
> >
> > Yes, I think you are right.
> > Fixed this problem.
> >
> 
> + /*
> + * Trying to send keepalive message after every change has some
> + * overhead, but testing showed there is no noticeable overhead if
> + * we do it after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> + if (++changes_count < CHANGES_THRESHOLD)
> + return;
> ...
> + changes_count = 0;
> 
> I think it is better to have this threshold-related code in that
> caller as we have in the previous version. Also, let's modify the
> comment as follows:"
> It is possible that the data is not sent to downstream for a long time
> either because the output plugin filtered it or there is a DDL that
> generates a lot of data that is not processed by the plugin. So, in
> such cases, the downstream can timeout. To avoid that we try to send a
> keepalive message if required. Trying to send a keepalive message
> after every change has some overhead, but testing showed there is no
> noticeable overhead if we do it after every ~100 changes."

Changed as suggested.

I also removed the comment atop the function update_progress_txn_cb_wrapper to
be consistent with the nearby *_cb_wrapper functions.

Attach the new patch.

Regards,
Wang Wei


v10-0001-Fix-the-logical-replication-timeout-during-proce.patch
Description:  v10-0001-Fix-the-logical-replication-timeout-during-proce.patch


Re: Making Vars outer-join aware

2023-01-30 Thread Richard Guo
On Tue, Jan 24, 2023 at 4:38 AM Tom Lane  wrote:

> Richard, are you planning to review this any more?  I'm getting
> a little antsy to get it committed.  For such a large patch,
> it's surprising it's had so few conflicts to date.


Sorry for the delayed reply.  I don't have any more review comments at
the moment, except a nitpicking one.

In optimizer/README at line 729 there is a query as

SELECT * FROM a
  LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
WHERE a.x = 1;

I think it should be

SELECT * FROM a
  LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
WHERE a.x = 1;

I have no objection to get it committed.

Thanks
Richard


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-30 Thread Takamichi Osumi (Fujitsu)
On Saturday, January 28, 2023 1:28 PM I wrote:
> Attached the updated patch v24.
Hi,


I've conducted the rebase affected by the commit(1e8b61735c)
by renaming the GUC to logical_replication_mode accordingly,
because it's utilized in the TAP test of this time-delayed LR feature.
There is no other change for this version.

Kindly have a look at the attached v25.


Best Regards,
Takamichi Osumi



v25-0001-Time-delayed-logical-replication-subscriber.patch
Description:  v25-0001-Time-delayed-logical-replication-subscriber.patch


MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Aleksander Alekseev
Hi hackers,

I'm having some difficulties building the documentation on MacOS.

I'm using ./full-build.sh script from [1] repository. It worked just
fine for many years but since recently it started to fail like this:

```
/usr/bin/xsltproc --path . --stringparam pg.version '16devel'
/Users/eax/projects/c/pgscripts/../postgresql/doc/src/sgml/stylesheet.xsl
postgres-full.xml
error : Unknown IO error
warning: failed to load external entity
"http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl";
compilation error: file
/Users/eax/projects/c/pgscripts/../postgresql/doc/src/sgml/stylesheet.xsl
line 6 element import
xsl:import : unable to load
http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
error : Unknown IO error
/Users/eax/projects/c/postgresql/doc/src/sgml/stylesheet-html-common.xsl:4:
warning: failed to load external entity
"http://docbook.sourceforge.net/release/xsl/current/common/entities.ent";
%common.entities;
 ^
Entity: line 1:
 %common.entities;
  ^
[...]

```

This is not a network problem. I can download chunk.xsl with wget and
also build the documentation on my Linux laptop.

I've tried `brew reinstall` and also:

```
./configure ... XMLLINT="xmllint --nonet" XSLTPROC="xsltproc --nonet"
```

... as suggested by the documentation [2] but it didn't change anything.

I checked the archive of pgsql-hackers@ but was unable to find
anything relevant.

I'm using MacOS Monterey 12.6.2.

```
$ brew info docbook
==> docbook: stable 5.1 (bottled)
...
$ brew info docbook-xsl
==> docbook-xsl: stable 1.79.2 (bottled)
...
```

At this point I could use a friendly piece of advice from the community.

[1]: https://github.com/afiskon/pgscripts/
[2]: https://www.postgresql.org/docs/15/docguide-toolsets.html

-- 
Best regards,
Aleksander Alekseev




Re: old_snapshot_threshold bottleneck on replica

2023-01-30 Thread Maxim Orlov
On Fri, 27 Jan 2023 at 18:18, Robert Haas  wrote:

>
> Interesting, but it's still not entirely clear to me from reading the
> comments why we should think that this is safe.
>

In overall, I think this is safe, because we do not change algorithm here.
More specific, threshold_timestamp have only used in a few cases:
1). To get the latest value by calling GetOldSnapshotThresholdTimestamp.
This will work, since we only change the sync type here from the spinlock
to an atomic.
2). In TransactionIdLimitedForOldSnapshots, but here no changes in the
behaviour will be done. Sync model will be the save as before the patch.
3). In SnapshotTooOldMagicForTest, which is a stub to make
old_snapshot_threshold tests appear "working". But no coherence with the
threshold_xid here.

So, we have a two use cases for the threshold_timestamp:
a). When the threshold_timestamp is used in conjunction with the
threshold_xid. We must use spinlock to sync.
b). When the threshold_timestamp is used without conjunction with the
threshold_xid. In this case, we use atomic values.

-- 
Best regards,
Maxim Orlov.


Re: Add SHELL_EXIT_CODE to psql

2023-01-30 Thread Maxim Orlov
Unfortunately, there is a fail in FreeBSD
https://cirrus-ci.com/task/6466749487382528

Maybe, this patch is need to be rebased?

-- 
Best regards,
Maxim Orlov.


Re: Considering additional sort specialisation functions for PG16

2023-01-30 Thread John Naylor
I wrote:

> On Thu, Jan 26, 2023 at 7:15 PM David Rowley  wrote:
> > I think the slower sorts I found in [2] could also be partially caused
> > by the current sort specialisation comparators re-comparing the
> > leading column during a tie-break. I've not gotten around to disabling
> > the sort specialisations to see if and how much this is a factor for
> > that test.
>
> Right, that's worth addressing independently of the window function
consideration. I'm still swapping this area back in my head, but I believe
one issue is that state->base.onlyKey signals two things: "one sortkey, not
abbreviated". We could add a separate branch for "first key unabbreviated,
nkeys>1" -- I don't think we'd need to specialize, just branch -- and
instead of state->base.comparetup, call a set of analogous functions that
only handle keys 2 and above (comparetup_tail_* ? or possibly just add a
boolean parameter compare_first). That would not pose a huge challenge, I
think, since they're already written like this:
>
> /* Compare the leading sort key */
> compare = ApplySortComparator(...);
> if (compare != 0)
> return compare;
>
> /* Compare additional sort keys */
> ...
>
> The null/non-null separation would eliminate a bunch of branches in
inlined comparators, so we could afford to add another branch for number of
keys.

I gave this a go, and it turns out we don't need any extra branches in the
inlined comparators -- the new fallbacks are naturally written to account
for the "!onlyKey" case. If the first sortkey was abbreviated, call its
full comparator, otherwise skip to the next sortkey (if there isn't one, we
shouldn't have gotten here). The existing comparetup functions try the
simple case and then call the fallback (which could be inlined for them but
I haven't looked).

Tests pass, but I'm not sure yet if we need more tests. I don't have a
purpose-built benchmark at the moment, but I'll see if any of my existing
tests exercise this code path. I can also try the window function case
unless someone beats me to it.

--
John Naylor
EDB: http://www.enterprisedb.com
From 65b94223f8bdb726b97d695f0040c12197d5e65d Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 30 Jan 2023 17:10:00 +0700
Subject: [PATCH v1] Split out fallback functionality from comparetup*
 functions

Previously, if a specialized comparator find equal datum1 keys,
the comparetup function would call the full comparator on the
datum before proceeding with either the unabbreviated first key
or the second key.

Add a comparetup_fallback field for these that call special
fallback functions. The existing comparetup functions just call
ApplySortComparator where we can, than call our new fallback.
---
 src/backend/utils/sort/tuplesort.c |   6 +-
 src/backend/utils/sort/tuplesortvariants.c | 114 -
 src/include/utils/tuplesort.h  |   6 ++
 3 files changed, 96 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab..54f64d6c2d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -513,7 +513,7 @@ qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_fallback(a, b, state);
 }
 
 #if SIZEOF_DATUM >= 8
@@ -537,7 +537,7 @@ qsort_tuple_signed_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_fallback(a, b, state);
 }
 #endif
 
@@ -561,7 +561,7 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_fallback(a, b, state);
 }
 
 /*
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index eb6cfcfd00..e65ac49d9d 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -47,18 +47,24 @@ static void removeabbrev_datum(Tuplesortstate *state, SortTuple *stups,
 			   int count);
 static int	comparetup_heap(const SortTuple *a, const SortTuple *b,
 			Tuplesortstate *state);
+static int	comparetup_heap_fallback(const SortTuple *a, const SortTuple *b,
+			Tuplesortstate *state);
 static void writetup_heap(Tuplesortstate *state, LogicalTape *tape,
 		  SortTuple *stup);
 static void readtup_heap(Tuplesortstate *state, SortTuple *stup,
 		 LogicalTape *tape, unsigned int len);
 static int	comparetup_cluster(const SortTuple *a, const SortTuple *b,
 			   Tuplesortstate *state);
+static int	comparetup_cluster_fallback(const SortTuple *a, const SortTuple *b,
+			   Tuplesortstate *state);
 static void writetup_cluster(Tuplesortstate *state, LogicalTape *tape,
 			 S

Re: Deadlock between logrep apply worker and tablesync worker

2023-01-30 Thread vignesh C
On Mon, 30 Jan 2023 at 13:00, houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 30, 2023 2:32 PM Amit Kapila  
> wrote:
> >
> > On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
> > >
> > > On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
> > > >
> > > > One thing that looks a bit odd is that we will anyway have a similar
> > > > check in replorigin_drop_guts() which is a static function and
> > > > called from only one place, so, will it be required to check at both 
> > > > places?
> > >
> > > There is a possibility that the initial check to verify if replication
> > > origin exists in replorigin_drop_by_name was successful but later one
> > > of either table sync worker or apply worker process might have dropped
> > > the replication origin,
> > >
> >
> > Won't locking on the particular origin prevent concurrent drops? IIUC, the
> > drop happens after the patch acquires the lock on the origin.
>
> Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
> already lock the origin before that. I think the check in replorigin_drop_guts
> is a custom check after calling SearchSysCache1 to get the tuple, but the 
> error
> should not happen as no concurrent drop can be performed.
>
> To make it simpler, one idea is to move the code that getting the tuple from
> system cache to the replorigin_drop_by_name(). After locking the origin, we
> can try to get the tuple and do the existence check, and we can reuse
> this tuple to perform origin delete. In this approach we only need to check
> origin existence once after locking. BTW, if we do this, then we'd better 
> rename the
> replorigin_drop_guts() to something like replorigin_state_clear() as the 
> function
> only clear the in-memory information after that.
>
> The code could be like:
>
> ---
> replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
> ...
> /*
>  * Lock the origin to prevent concurrent drops. We keep the lock 
> until the
>  * end of transaction.
>  */
> LockSharedObject(ReplicationOriginRelationId, roident, 0,
>  AccessExclusiveLock);
>
> tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
> if (!HeapTupleIsValid(tuple))
> {
> if (!missing_ok)
> elog(ERROR, "cache lookup failed for replication 
> origin with ID %d",
>  roident);
>
> return;
> }
>
> replorigin_state_clear(rel, roident, nowait);
>
> /*
>  * Now, we can delete the catalog entry.
>  */
> CatalogTupleDelete(rel, &tuple->t_self);
> ReleaseSysCache(tuple);
>
> CommandCounterIncrement();
> ...

+1 for this change as it removes the redundant check which is not
required. I will post an updated version for this.

Regards,
Vignesh




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-30 Thread Takamichi Osumi (Fujitsu)
On Monday, January 30, 2023 12:02 PM Kyotaro Horiguchi 
 wrote:
> At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)"
>  wrote in
> > On Friday, January 27, 2023 8:00 PM Amit Kapila
>  wrote:
> > > So, you have changed min_apply_delay from int64 to int32, but you
> > > haven't mentioned the reason for the same? We use 'int' for the
> > > similar parameter recovery_min_apply_delay, so, ideally, it makes
> > > sense but still better to tell your reason explicitly.
> > Yes. It's because I thought I need to make this feature consistent with the
> recovery_min_apply_delay.
> > This feature handles the range same as the recovery_min_apply delay
> > from 0 to INT_MAX now so should be adjusted to match it.
> 
> INT_MAX can stick out of int32 on some platforms. (I'm not sure where that
> actually happens, though.) We can use PG_INT32_MAX instead.
> 
> IMHO, I think we don't use int as a catalog column and I agree that
> int32 is sufficient since I don't think more than 49 days delay is practical. 
> On
> the other hand, maybe I wouldn't want to use int32 for intermediate
> calculations.
Hi, Horiguchi-san. Thanks for your comments !


IIUC, in the last sentence, you proposed the type of
SubOpts min_apply_delay should be change to "int". But
I couldn't find actual harm of the current codes, because
we anyway insert the SubOpts value to the catalog after holding it in SubOpts.
Also, it seems there is no explicit rule where we should use "int" local 
variables
for "int32" system catalog values internally. I had a look at other
variables for int32 system catalog members and either looked fine.

So, I'd like to keep the current code as it is, until actual harm is found.
The latest patch can be seen in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373E26884C385EFFFB8965FEDD39%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi





Re: Generating code for query jumbling through gen_node_support.pl

2023-01-30 Thread Michael Paquier
On Fri, Jan 27, 2023 at 11:59:47AM +0900, Michael Paquier wrote:
> Using that, I can compile the following results for various cases (-O2
> and compute_query_id=on):
>   query  |  mode  | iterations | avg_runtime_ns | 
> avg_jumble_ns 
> -++++---
>  begin   | string |   5000 |4.53116 |  
> 4.54
>  begin   | jumble |   5000 |   30.94578 | 
> 30.94
>  commit  | string |   5000 |4.76004 |  
> 4.74
>  commit  | jumble |   5000 |31.4791 | 
> 31.48
>  create table 1 column   | string |   5000 |7.22836 |  
> 7.08
>  create table 1 column   | jumble |   5000 |  152.10852 |
> 151.96
>  create table 5 columns  | string |   5000 |   12.43412 | 
> 12.28
>  create table 5 columns  | jumble |   5000 |  352.88976 | 
> 349.1
>  create table 20 columns | string |500 | 49.591 |  
> 48.2
>  create table 20 columns | jumble |500 |  2272.4066 |  
> 2271
>  drop table 1 column | string |   5000 |6.70538 |  
> 6.56
>  drop table 1 column | jumble |   5000 |  50.38 | 
> 50.24
>  drop table 5 columns| string |   5000 |6.88256 |  
> 6.74
>  drop table 5 columns| jumble |   5000 |   50.02898 |  
> 49.9
>  SET work_mem| string |   5000 |7.28752 |  
> 7.28
>  SET work_mem| jumble |   5000 |   91.66588 | 
> 91.64
> (16 rows)

Just to close the loop here, I have done more measurements to compare
the jumble done for some DMLs and some SELECTs between HEAD and the
patch (forgot to post some last Friday).  Both methods show comparable
results:
query |  mode  | iterations | avg_runtime_ns | avg_jumble_ns 
--++++---
 insert table 10 cols | master |   5000 |  377.17878 |377.04
 insert table 10 cols | jumble |   5000 |  409.47924 |409.34
 insert table 20 cols | master |   5000 |  692.94924 | 692.8
 insert table 20 cols | jumble |   5000 |   710.0901 |709.96
 insert table 5 cols  | master |   5000 |  232.44308 | 232.3
 insert table 5 cols  | jumble |   5000 |  253.49854 |253.36
 select 10 cols   | master |   5000 |  449.13608 |383.36
 select 10 cols   | jumble |   5000 |  491.61912 |323.86
 select 5 cols| master |   5000 |277.477 |277.46
 select 5 cols| jumble |   5000 |  323.88152 |323.86
(10 rows)

The averages are in ns, so the difference does not bother me much.
There may be some noise mixed in that ;)

(Attached is the tweak I have applied on HEAD to get some numbers.)
--
Michael
diff --git a/src/backend/nodes/queryjumblefuncs.c 
b/src/backend/nodes/queryjumblefuncs.c
index 16084842a3..c66090bde3 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -101,10 +101,14 @@ CleanQuerytext(const char *query, int *location, int *len)
 JumbleState *
 JumbleQuery(Query *query, const char *querytext)
 {
+#define MAX_QUERY_COMPUTE 5000
+
JumbleState *jstate = NULL;
 
Assert(IsQueryIdEnabled());
 
+   elog(WARNING, "start JumbleQuery");
+
if (query->utilityStmt)
{
query->queryId = compute_utility_query_id(querytext,
@@ -114,18 +118,26 @@ JumbleQuery(Query *query, const char *querytext)
else
{
jstate = (JumbleState *) palloc(sizeof(JumbleState));
-
-   /* Set up workspace for query jumbling */
jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-   jstate->jumble_len = 0;
jstate->clocations_buf_size = 32;
jstate->clocations = (LocationLen *)
palloc(jstate->clocations_buf_size * 
sizeof(LocationLen));
-   jstate->clocations_count = 0;
-   jstate->highest_extern_param_id = 0;
 
-   /* Compute query ID and mark the Query node with it */
-   JumbleQueryInternal(jstate, query);
+   for (int i = 0; i < MAX_QUERY_COMPUTE ; i++)
+   {
+   /* Set up workspace for query jumbling */
+   memset(jstate->jumble, 0, sizeof(JUMBLE_SIZE));
+   jstate->jumble_len = 0;
+   jstate->clocations_buf_size = 32;
+   memset(jstate->clocations, 0,
+  jstate->clocations_buf_size * 
sizeof(LocationLen));
+
+   jstate->clocations_count = 0;
+   jstate->highest_extern_p

Re: [DOCS] Stats views and functions not in order?

2023-01-30 Thread Peter Eisentraut

On 30.01.23 07:12, Peter Smith wrote:

Meanwhile, this pagination topic has strayed far away from the
original $SUBJECT, so I guess since there is nothing else pending this
thread's CF entry [1] can just be marked as "Committed" now?


done





Re: Generating code for query jumbling through gen_node_support.pl

2023-01-30 Thread Peter Eisentraut

On 27.01.23 04:07, Michael Paquier wrote:

On Thu, Jan 26, 2023 at 09:39:05AM +0100, Peter Eisentraut wrote:

There are a couple of repetitive comments, like "typmod and collation
information are irrelevant for the query jumbling".  This applies to all
nodes, so we don't need to repeat it for a number of nodes (and then not
mention it for other nodes).  Maybe there should be a central place
somewhere that describes "these kinds of fields should normally be ignored".


The place that would make the most sense to me to centralize this
knowlegde is nodes.h itself, because that's where the node attributes
are defined?


Either that or src/backend/nodes/README.




Re: Generating code for query jumbling through gen_node_support.pl

2023-01-30 Thread Peter Eisentraut

On 27.01.23 03:59, Michael Paquier wrote:

At the end, that would be unnoticeable for the average user, I guess,
but here are the numbers I get on my laptop :)


Personally, I think we do not want the two jumble methods in parallel.

Maybe there are other opinions.

I'm going to set this thread as "Ready for Committer".  Either wait a 
bit for more feedback on this topic, or just go ahead with either 
solution.  We can leave it as a semi-open item for reconsideration later.






Re: Allow logical replication to copy tables in binary format

2023-01-30 Thread Melih Mutlu
Hi Bharath,

Thanks for reviewing.

Bharath Rupireddy , 18 Oca 2023
Çar, 10:17 tarihinde şunu yazdı:

> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu 
> wrote:
> 1. The performance numbers posted upthread [1] look impressive for the
> use-case tried, that's a 2.25X improvement or 55.6% reduction in
> execution times. However, it'll be great to run a few more varied
> tests to confirm the benefit.
>

Sure, do you have any specific test case or suggestion in mind?


> 2. It'll be great to capture the perf report to see if the time spent
> in copy_table() is reduced with the patch.
>

Will provide that in another email soon.


> 3. I think blending initial table sync's binary copy option with
> data-type level binary send/receive is not good. Moreover, data-type
> level binary send/receive has its own restrictions per 9de77b5453.
> IMO, it'll be good to have a new option, say copy_data_format synonyms
> with COPY command's FORMAT option but with only text (default) and
> binary values.
>

Added a "copy_format" option for subscriptions with text as default value.
So it would be possible to create a binary subscription but copy tables in
text format to avoid restrictions that you're concerned about.


> 4. Why to add tests to existing 002_types.pl? Can we add a new file
> with all the data types covered?
>

Since 002_types.pl is where the replication of data types are covered. I
thought it would be okay to test replication with the binary option in that
file.
Sure, we can add a new file with different data types for testing
subscriptions with binary option. But then I feel like it would have too
many duplicated lines with 002_types.pl.
If you think that 002_types.pl lacks some data types needs to be tested,
then we should add those into 002_types.pl too whether we test subscription
with binary option in that file or some other place, right?


> 5. It's not clear to me as to why these rows were removed in the patch?
> -(1, '{1, 2, 3}'),
> -(2, '{2, 3, 1}'),
>  (3, '{3, 2, 1}'),
>  (4, '{4, 3, 2}'),
>  (5, '{5, NULL, 3}');
>
>  -- test_tbl_arrays
>  INSERT INTO tst_arrays (a, b, c, d) VALUES
> -('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1
> day", "2 days", "3 days"}'),
> -('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2
> minutes", "3 minutes", "1 minute"}'),
>

Previously, it wasn't actually testing the initial table sync since all
tables were empty when subscription was created.
I just simply split the data initially inserted to test initial table sync.

With this patch, it inserts the first two rows for all data types before
subscriptions get created.
You can see these lines:

> +# Insert initial test data
> +$node_publisher->safe_psql(
> +   'postgres', qq(
> +   -- test_tbl_one_array_col
> +   INSERT INTO tst_one_array (a, b) VALUES
> +   (1, '{1, 2, 3}'),
> +   (2, '{2, 3, 1}');
> +
> +   -- test_tbl_arrays
> +   INSERT INTO tst_arrays (a, b, c, d) VALUES




> 6. BTW, the subbinary description is missing in pg_subscription docs
> https://www.postgresql.org/docs/devel/catalog-pg-subscription.html?
> -  Specifies whether the subscription will request the publisher to
> -  send the data in binary format (as opposed to text).
> +  Specifies whether the subscription will copy the initial data to
> +  synchronize relations in binary format and also request the
> publisher
> +  to send the data in binary format too (as opposed to text).
>

Done.


> 7. A nitpick - space is needed after >= before 16.
> +if (walrcv_server_version(LogRepWorkerWalRcvConn) >=16 &&
>

Done.


> 8. Note that the COPY binary format isn't portable across platforms
> (Windows to Linux for instance) or major versions
> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical
> replication is
> https://www.postgresql.org/docs/devel/logical-replication.html.
> I don't see any handling of such cases in copy_table but only a check
> for the publisher version. I think we need to account for all the
> cases - allow binary COPY only when publisher and subscriber are of
> same versions, architectures, platforms. The trick here is how we
> identify if the subscriber is of the same type and taste
> (architectures and platforms) as the publisher. Looking for
> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not
> sure if there's a better way to do it.
>

I think having the "copy_format" option with text as default, like I
replied to your 3rd review above, will keep things as they are now.
The patch now will only allow users to choose binary copy as well, if they
want it and acknowledge the limitations that come with binary copy.
COPY command's portability issues shouldn't be an issue right now, since
the patch still supports text format. Right?


> Also, the COPY binary format is very data type specific - per the docs
> "for example it will not 

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Aleksander Alekseev
Hi hackers,

> At this point I could use a friendly piece of advice from the community.

I've found a solution:

```
export SGML_CATALOG_FILES=/usr/local/etc/xml/catalog
export XMLLINT="xmllint --catalogs"
export XSLTPROC="xsltproc --catalogs"
```

I will submit a patch for the documentation in a bit, after I'll check
it properly.

-- 
Best regards,
Aleksander Alekseev




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-30 Thread Amit Kapila
On Thu, Dec 8, 2022 at 8:17 AM Masahiko Sawada  wrote:
>
> On Mon, Nov 21, 2022 at 4:31 PM Amit Kapila  wrote:
>
> One idea to fix this issue is that in
> ReplicationSlotsComputeRequiredXmin(), we compute the minimum xmin
> while holding both ProcArrayLock and ReplicationSlotControlLock, and
> release only ReplicationSlotsControlLock before updating the
> replication_slot_xmin. I'm concerned it will increase the contention
> on ProcArrayLock but I've attached the patch for discussion.
>

But what kind of workload are you worried about? This will be called
while processing XLOG_RUNNING_XACTS to update
procArray->replication_slot_xmin/procArray->replication_slot_catalog_xmin
only when required. So, if we want we can test some concurrent
workloads along with walsenders doing the decoding to check if it
impacts performance.

What other way we can fix this? Do you think we can try to avoid
retreating xmin values in ProcArraySetReplicationSlotXmin() to avoid
this problem? Personally, I think taking the lock as proposed by your
patch is a better idea. BTW, this problem seems to be only logical
replication specific, so if we are too worried then we can change this
locking only for logical replication.

-- 
With Regards,
Amit Kapila.




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-30 Thread Amit Kapila
On Mon, Jan 30, 2023 at 11:34 AM Amit Kapila  wrote:
>
> I have reproduced it manually. For this, I had to manually make the
> debugger call ReplicationSlotsComputeRequiredXmin(false) via path
> SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot()->LogicalConfirmReceivedLocation()
> ->ReplicationSlotsComputeRequiredXmin(false) for the apply worker. The
> sequence of events is something like (a) the replication_slot_xmin for
> tablesync worker is overridden by apply worker as zero as explained in
> Sawada-San's email, (b) another transaction happened on the publisher
> that will increase the value of ShmemVariableCache->nextXid (c)
> tablesync worker invokes
> SnapBuildInitialSnapshot()->GetOldestSafeDecodingTransactionId() which
> will return an oldestSafeXid which is higher than snapshot's xmin.
> This happens because replication_slot_xmin has an InvalidTransactionId
> value and we won't consider replication_slot_catalog_xmin because
> catalogOnly flag is false and there is no other open running
> transaction. I think we should try to get a simplified test to
> reproduce this problem if possible.
>

Here are steps to reproduce it manually with the help of a debugger:

Session-1
==
select pg_create_logical_replication_slot('s', 'test_decoding');
create table t2(c1 int);
select pg_replication_slot_advance('s', pg_current_wal_lsn()); --
Debug this statement. Stop before taking procarraylock in
ProcArraySetReplicationSlotXmin.

Session-2

psql -d postgres
Begin;

Session-3
===
psql -d "dbname=postgres replication=database"

begin transaction isolation level repeatable read read only;
CREATE_REPLICATION_SLOT slot1 LOGICAL test_decoding USE_SNAPSHOT;
--Debug this statement. Stop in SnapBuildInitialSnapshot() before
taking procarraylock

Session-1
==
Continue debugging and finish execution of
ProcArraySetReplicationSlotXmin. Verify
procArray->replication_slot_xmin is zero.

Session-2
=
Select txid_current();
Commit;

Session-3
==
Continue debugging.
Verify that safeXid follows snap->xmin. This leads to assertion (in
back branches) or error (in HEAD).

-- 
With Regards,
Amit Kapila.




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-30 Thread Amit Kapila
On Fri, Jan 27, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for making the patch! I'm still considering whether this approach is
> correct, but I can put a comment to your patch anyway.
>
> ```
> -   Assert(!already_locked || LWLockHeldByMe(ProcArrayLock));
> -
> -   if (!already_locked)
> -   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> +   Assert(LWLockHeldByMe(ProcArrayLock));
> ```
>
> In this function, we regard that the ProcArrayLock has been already acquired 
> as
> exclusive mode and modify data. I think LWLockHeldByMeInMode() should be used
> instead of LWLockHeldByMe().
>

Right, this is even evident from the comments atop
ReplicationSlotsComputeRequiredXmin("If already_locked is true,
ProcArrayLock has already been acquired exclusively.". But, I am not
sure if it is a good idea to remove 'already_locked' parameter,
especially in back branches as this is an exposed API.

-- 
With Regards,
Amit Kapila.




Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-01-30 Thread Nitin Jadhav
> I kind of think this is a lot of unnecessary work.  The case that is
> problematic is a GUC that's marked GUC_NO_SHOW_ALL but not marked
> GUC_NOT_IN_SAMPLE.  There aren't any of those, and I don't think there
> are likely to be any in future, because it doesn't make a lot of sense.
> Why don't we just make a policy against doing that, and enforce it
> with an assertion somewhere in GUC initialization?
>
> Looking at guc.sql, I think that these is a second mistake: the test
> checks for (no_show_all AND !no_reset_all) but this won't work
> because NO_SHOW_ALL GUCs cannot be scanned via SQL.  There are two
> parameters that include this combination of flags: default_with_oids
> and ssl_renegotiation_limit, so the check would not do what it
> should.  I think that this part should be removed.

Thanks Michael for identifying a new mistake. I am a little confused
here. I don't understand why GUC_NO_SHOW_ALL depends on other GUCs
like GUC_NOT_IN_SAMPLE or GUC_NO_RESET_ALL. Looks like the dependency
between GUC_NO_RESET_ALL and GUC_NO_SHOW_ALL is removed in the above
patch since we have these combinations now. Similarly why can't we
have a GUC marked as GUC_NO_SHOW_ALL but not GUC_NOT_IN_CONFIG. For me
it makes sense if a GUC is marked as NO_SHOW_ALL and it can be present
in a sample file. It's up to the author/developer to choose which all
flags are applicable to the newly introduced GUCs. Please share your
thoughts.

Thanks & Regards,
Nitin Jadhav

On Mon, Jan 30, 2023 at 10:36 AM Michael Paquier  wrote:
>
> On Sun, Jan 29, 2023 at 01:05:07PM -0500, Tom Lane wrote:
> > I kind of think this is a lot of unnecessary work.  The case that is
> > problematic is a GUC that's marked GUC_NO_SHOW_ALL but not marked
> > GUC_NOT_IN_SAMPLE.  There aren't any of those, and I don't think there
> > are likely to be any in future, because it doesn't make a lot of sense.
> > Why don't we just make a policy against doing that, and enforce it
> > with an assertion somewhere in GUC initialization?
>
> [..thinks..]
>
> Looking at guc.sql, I think that these is a second mistake: the test
> checks for (no_show_all AND !no_reset_all) but this won't work
> because NO_SHOW_ALL GUCs cannot be scanned via SQL.  There are two
> parameters that include this combination of flags: default_with_oids
> and ssl_renegotiation_limit, so the check would not do what it
> should.  I think that this part should be removed.
>
> For the second part to prevent GUCs to be marked as NO_SHOW_ALL &&
> !NOT_IN_SAMPLE, check_GUC_init() looks like the right place to me,
> because this routine has been designed exactly for this purpose.
>
> So, what do you think about the attached?
> --
> Michael




Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Aleksander Alekseev
Hi hackers,

> I've found a solution:
>
> ```
> export SGML_CATALOG_FILES=/usr/local/etc/xml/catalog
> export XMLLINT="xmllint --catalogs"
> export XSLTPROC="xsltproc --catalogs"
> ```
>
> I will submit a patch for the documentation in a bit, after I'll check
> it properly.

PFA the patch.

I don't have a strong opinion regarding any particular wording and
would like to ask the committer to change it as he sees fit.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Document-the-workaround-for-xsltproc-when-buildin.patch
Description: Binary data


Re: Deadlock between logrep apply worker and tablesync worker

2023-01-30 Thread vignesh C
On Mon, 30 Jan 2023 at 13:00, houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 30, 2023 2:32 PM Amit Kapila  
> wrote:
> >
> > On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
> > >
> > > On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
> > > >
> > > > One thing that looks a bit odd is that we will anyway have a similar
> > > > check in replorigin_drop_guts() which is a static function and
> > > > called from only one place, so, will it be required to check at both 
> > > > places?
> > >
> > > There is a possibility that the initial check to verify if replication
> > > origin exists in replorigin_drop_by_name was successful but later one
> > > of either table sync worker or apply worker process might have dropped
> > > the replication origin,
> > >
> >
> > Won't locking on the particular origin prevent concurrent drops? IIUC, the
> > drop happens after the patch acquires the lock on the origin.
>
> Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
> already lock the origin before that. I think the check in replorigin_drop_guts
> is a custom check after calling SearchSysCache1 to get the tuple, but the 
> error
> should not happen as no concurrent drop can be performed.
>
> To make it simpler, one idea is to move the code that getting the tuple from
> system cache to the replorigin_drop_by_name(). After locking the origin, we
> can try to get the tuple and do the existence check, and we can reuse
> this tuple to perform origin delete. In this approach we only need to check
> origin existence once after locking. BTW, if we do this, then we'd better 
> rename the
> replorigin_drop_guts() to something like replorigin_state_clear() as the 
> function
> only clear the in-memory information after that.
>
> The code could be like:
>
> ---
> replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
> ...
> /*
>  * Lock the origin to prevent concurrent drops. We keep the lock 
> until the
>  * end of transaction.
>  */
> LockSharedObject(ReplicationOriginRelationId, roident, 0,
>  AccessExclusiveLock);
>
> tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
> if (!HeapTupleIsValid(tuple))
> {
> if (!missing_ok)
> elog(ERROR, "cache lookup failed for replication 
> origin with ID %d",
>  roident);
>
> return;
> }
>
> replorigin_state_clear(rel, roident, nowait);
>
> /*
>  * Now, we can delete the catalog entry.
>  */
> CatalogTupleDelete(rel, &tuple->t_self);
> ReleaseSysCache(tuple);
>
> CommandCounterIncrement();

The attached updated patch has the changes to handle the same.

Regards,
Vignesh
From dc2e9acc6a55772896117cf3b88e4189f994a82d Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 27 Jan 2023 15:17:09 +0530
Subject: [PATCH v2] Lock the replication origin record instead of locking the
 pg_replication_origin relation.

Lock the replication origin record instead of locking the
pg_replication_origin relation.
---
 src/backend/replication/logical/origin.c | 56 
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index b754c43840..3e360cf41e 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -338,16 +338,14 @@ replorigin_create(const char *roname)
  * Helper function to drop a replication origin.
  */
 static void
-replorigin_drop_guts(Relation rel, RepOriginId roident, bool nowait)
+replorigin_state_clear(RepOriginId roident, bool nowait)
 {
-	HeapTuple	tuple;
 	int			i;
 
 	/*
-	 * First, clean up the slot state info, if there is any matching slot.
+	 * Clean up the slot state info, if there is any matching slot.
 	 */
 restart:
-	tuple = NULL;
 	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
 	for (i = 0; i < max_replication_slots; i++)
@@ -402,19 +400,6 @@ restart:
 	}
 	LWLockRelease(ReplicationOriginLock);
 	ConditionVariableCancelSleep();
-
-	/*
-	 * Now, we can delete the catalog entry.
-	 */
-	tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "cache lookup failed for replication origin with ID %d",
-			 roident);
-
-	CatalogTupleDelete(rel, &tuple->t_self);
-	ReleaseSysCache(tuple);
-
-	CommandCounterIncrement();
 }
 
 /*
@@ -427,24 +412,37 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
 {
 	RepOriginId roident;
 	Relation	rel;
+	HeapTuple	tuple;
 
 	Assert(IsTransactionState());
 
-	/*
-	 * To interlock against concurrent drops, we hold ExclusiveLock on
-	 * pg_replication_origin till xact commit.
-	 *
-	 * XXX We can optimize this by acquiring the lock on a specific origin by
-	 * using LockSharedObject if required

Re: generate_series for timestamptz and time zone problem

2023-01-30 Thread Przemysław Sztoch


Gurjeet Singh wrote on 30.01.2023 08:18:

On Sat, Nov 12, 2022 at 10:44 AM Tom Lane  wrote:

However, what it shouldn't be is part of this patch.  It's worth
pushing it separately to have a record of that decision.  I've
now done that, so you'll need to rebase to remove that delta.

I looked over the v5 patch very briefly, and have two main
complaints:

* There's no documentation additions.  You can't add a user-visible
function without adding an appropriate entry to func.sgml.

* I'm pretty unimpressed with the whole truncate-to-interval thing
and would recommend you drop it.  I don't think it's adding much
useful functionality beyond what we can already do with the existing
date_trunc variants; and the definition seems excessively messy
(too many restrictions and special cases).

Please see attached v6 of the patch.

The changes since v5 are:
.) Rebased and resolved conflicts caused by commit 533e02e92.
.) Removed code and tests related to new date_trunc() functions, as
suggested by Tom.
.) Added 3 more variants to accompany with date_add(tstz, interval, zone).
 date_add(tstz, interval)
 date_subtract(tstz, interval)
 date_subtract(tstz, interval, zone)

.) Eliminate duplication of code; use common function to implement
generate_series_timestamptz[_at_zone]() functions.
.) Fixed bug where in one of the new code paths,
generate_series_timestamptz_with_zone(),  did not perform
TIMESTAMP_NOT_FINITE() check.
.) Replaced some DirectFunctionCall?() with direct calls to the
relevant *_internal() function; should be better for performance.
.) Added documentation all 5 functions (2 date_add(), 2
date_subtract(), 1 overloaded version of generate_series()).

Other work distracted me from this patch.
I looked at your update v6 and it looks ok.
For me the date_trunc function is important and I still have some corner 
cases. Now I will continue working with data_trunc in a separate patch.

I'm not sure of the convention around authorship. But since this was
not an insignificant amount of work, would this patch be considered as
co-authored by Przemyslaw and myself? Should I add myself to Authors
field in the Commitfest app?

I see no obstacles for us to be co-authors.

Hi Przemyslaw,
 I started working on this patch based on Tom's review a few days
ago, since you hadn't responded in a while, and I presumed you're not
working on this anymore. I should've consulted with/notified you of my
intent before starting to work on it, to avoid duplication of work.
Sorry if this submission obviates any work you have in progress.
Please feel free to provide your feedback on the v6 of the patch.
I propose to get the approval of the current truncated version of the 
patch. As I wrote above, I will continue work on date_trunc later and as 
a separate patch.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Allow logical replication to copy tables in binary format

2023-01-30 Thread Bharath Rupireddy
On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu  wrote:
>

Thanks for providing an updated patch.

>> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu  wrote:
>> 1. The performance numbers posted upthread [1] look impressive for the
>> use-case tried, that's a 2.25X improvement or 55.6% reduction in
>> execution times. However, it'll be great to run a few more varied
>> tests to confirm the benefit.
>
> Sure, do you have any specific test case or suggestion in mind?

Have a huge amount of publisher's table (with mix of columns like int,
text, double, bytea and so on) prior data so that the subscriber's
table sync workers have to do a "good" amount of work to copy, then
measure the copy_table time with and without patch.

>> 2. It'll be great to capture the perf report to see if the time spent
>> in copy_table() is reduced with the patch.
>
> Will provide that in another email soon.

Thanks.

>> 4. Why to add tests to existing 002_types.pl? Can we add a new file
>> with all the data types covered?
>
> Since 002_types.pl is where the replication of data types are covered. I 
> thought it would be okay to test replication with the binary option in that 
> file.
> Sure, we can add a new file with different data types for testing 
> subscriptions with binary option. But then I feel like it would have too many 
> duplicated lines with 002_types.pl.
> If you think that 002_types.pl lacks some data types needs to be tested, then 
> we should add those into 002_types.pl too whether we test subscription with 
> binary option in that file or some other place, right?
>
> Previously, it wasn't actually testing the initial table sync since all 
> tables were empty when subscription was created.
> I just simply split the data initially inserted to test initial table sync.
>
> With this patch, it inserts the first two rows for all data types before 
> subscriptions get created.
> You can see these lines:

It'd be better and clearer to have a separate TAP test file IMO since
the focus of the feature here isn't to just test for data types. With
separate tests, you can verify "ERROR:  incorrect binary data format
in logical replication column 1" cases.

>> 8. Note that the COPY binary format isn't portable across platforms
>> (Windows to Linux for instance) or major versions
>> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical
>> replication is 
>> https://www.postgresql.org/docs/devel/logical-replication.html.
>> I don't see any handling of such cases in copy_table but only a check
>> for the publisher version. I think we need to account for all the
>> cases - allow binary COPY only when publisher and subscriber are of
>> same versions, architectures, platforms. The trick here is how we
>> identify if the subscriber is of the same type and taste
>> (architectures and platforms) as the publisher. Looking for
>> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not
>> sure if there's a better way to do it.
>
> I think having the "copy_format" option with text as default, like I replied 
> to your 3rd review above, will keep things as they are now.
> The patch now will only allow users to choose binary copy as well, if they 
> want it and acknowledge the limitations that come with binary copy.
> COPY command's portability issues shouldn't be an issue right now, since the 
> patch still supports text format. Right?

With the above said, do you think checking for publisher versions is
needed? The user can decide to enable/disable binary COPY based on the
publisher's version no?
+/* If the publisher is v16 or later, specify the format to copy data. */
+if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16)
+{

Few more comments on v7:
1.
+  Specifies the format in which pre-existing data on the publisher will
+  copied to the subscriber. Supported formats are
+  text and binary. The default is
+  text.
It'll be good to call out the cases in the documentation as to where
copy_format can be enabled and needs to be disabled.

2.
+ errmsg("%s value should be either \"text\" or \"binary\"",
How about "value must be either "?

3.
+if (!opts->binary &&
+opts->copy_format == LOGICALREP_COPY_AS_BINARY)
+{
+ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s and %s are mutually exclusive options",
+"binary = false", "copy_format = binary")));

+"CREATE SUBSCRIPTION tap_sub_binary CONNECTION
'$publisher_connstr' PUBLICATION tap_pub WITH (slot_name =
tap_sub_binary_slot, binary = true, copy_format = 'binary')"
Why should the subscription's binary option and copy_format option be
tied at all? Tying these two options hurts usability. Is there a
fundamental reason? I think they both are/must be independent. One
deals with data types and another deals with how initial table data is
copied.

--
Bharath Rupireddy
PostgreSQL Contributors Tea

Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-30 Thread Masahiko Sawada
On Mon, Jan 30, 2023 at 8:30 PM Amit Kapila  wrote:
>
> On Fri, Jan 27, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Thank you for making the patch! I'm still considering whether this approach 
> > is
> > correct, but I can put a comment to your patch anyway.
> >
> > ```
> > -   Assert(!already_locked || LWLockHeldByMe(ProcArrayLock));
> > -
> > -   if (!already_locked)
> > -   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> > +   Assert(LWLockHeldByMe(ProcArrayLock));
> > ```
> >
> > In this function, we regard that the ProcArrayLock has been already 
> > acquired as
> > exclusive mode and modify data. I think LWLockHeldByMeInMode() should be 
> > used
> > instead of LWLockHeldByMe().
> >
>
> Right, this is even evident from the comments atop
> ReplicationSlotsComputeRequiredXmin("If already_locked is true,
> ProcArrayLock has already been acquired exclusively.".

Agreed, will fix in the next version patch.

> But, I am not
> sure if it is a good idea to remove 'already_locked' parameter,
> especially in back branches as this is an exposed API.

Yes, we should not remove the already_locked parameter in
backbranches. So I was thinking of keeping it on back branches.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-30 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

Thank you for updating the patch!
I checked your replies and new patch, and it seems good.
Currently I have no comments

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-30 Thread Amit Kapila
On Fri, Jan 27, 2023 at 6:32 PM Önder Kalacı  wrote:
>>
>> I suppose the options are:
>> 1. use regular planner uniformly
>> 2. use regular planner only when there's no replica identity (or 
>> configurable?)
>> 3. only use low-level functions
>> 4. keep using sequential scans for every single updated row
>> 5. introduce a hidden logical row identifier in the heap that is guaranteed 
>> unique within a table and can be used as a replica identity when no unique 
>> index exists
>
>
> One other option I considered was to ask the index explicitly on the 
> subscriber side from the user when REPLICA IDENTITY is FULL. But, it is a 
> pretty hard choice for any user, even a planner sometimes fails to pick the 
> right index :)  Also, it is probably controversial to change any of the APIs 
> for this purpose?
>

I agree that it won't be a very convenient option for the user but how
about along with asking for an index from the user (when the user
didn't provide an index), we also allow to make use of any unique
index over a subset of the transmitted columns, and if there's more
than one candidate index pick any one. Additionally, we can allow
disabling the use of an index scan for this particular case. If we are
too worried about API change for allowing users to specify the index
then we can do that later or as a separate patch.

> I'd be happy to hear from more experienced hackers on the trade-offs for the 
> above, and I'd be open to work on that if there is a clear winner. For me (3) 
> is a decent solution for the problem.
>

>From the discussion above it is not very clear that adding maintenance
costs in this area is worth it even though that can give better
results as far as this feature is concerned.

-- 
With Regards,
Amit Kapila.




Re: JSONPath Child Operator?

2023-01-30 Thread Filipp Krylov

Hi David,

On 2022-11-10 21:55, David E. Wheeler wrote:
My question: Are there plans to support square bracket syntax for JSON 
object field name strings like this? Or to update to follow the 
standard as it’s finalized?


This syntax is a part of "jsonpath syntax extensions" patchset: 
https://www.postgresql.org/message-id/e0fe4f7b-da0b-471c-b3da-d8adaf314357%40postgrespro.ru


-- Ph.




Re: dynamic result sets support in extended query protocol

2023-01-30 Thread Alvaro Herrera
On 2022-Nov-22, Peter Eisentraut wrote:

> I added tests using the new psql \bind command to test this functionality in
> the extended query protocol, which showed that this got broken since I first
> wrote this patch.  This "blame" is on the pipeline mode in libpq patch
> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need to spend more time on
> this and figure out how to repair it.

Applying this patch, your test queries seem to work correctly.

This is quite WIP, especially because there's a couple of scenarios
uncovered by tests that I'd like to ensure correctness about, but if you
would like to continue adding tests for extended query and dynamic
result sets, it may be helpful.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index ec62550e38..b530c51ccd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -2109,19 +2109,19 @@ PQgetResult(PGconn *conn)
break;
 
case PGASYNC_READY:
+   res = pqPrepareAsyncResult(conn);
 
/*
-* For any query type other than simple query protocol, 
we advance
-* the command queue here.  This is because for simple 
query
-* protocol we can get the READY state multiple times 
before the
-* command is actually complete, since the command 
string can
-* contain many queries.  In simple query protocol, the 
queue
-* advance is done by fe-protocol3 when it receives 
ReadyForQuery.
+* When an error has occurred, we consume one command 
from the
+* queue for each result we return.  (Normally, the 
command would
+* be consumed as each result is read from the server.)
 */
if (conn->cmd_queue_head &&
-   conn->cmd_queue_head->queryclass != 
PGQUERY_SIMPLE)
+   (conn->error_result ||
+(conn->result != NULL &&
+ conn->result->resultStatus == 
PGRES_FATAL_ERROR)))
pqCommandQueueAdvance(conn);
-   res = pqPrepareAsyncResult(conn);
+
if (conn->pipelineStatus != PQ_PIPELINE_OFF)
{
/*
diff --git a/src/interfaces/libpq/fe-protocol3.c 
b/src/interfaces/libpq/fe-protocol3.c
index 8ab6a88416..2ed74aa0f1 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -205,6 +205,10 @@ pqParseInput3(PGconn *conn)
pqSaveErrorResult(conn);
}
}
+   if (conn->cmd_queue_head &&
+   
conn->cmd_queue_head->queryclass != PGQUERY_SIMPLE)
+   pqCommandQueueAdvance(conn);
+
if (conn->result)

strlcpy(conn->result->cmdStatus, conn->workBuffer.data,
CMDSTATUS_LEN);
@@ -231,6 +235,7 @@ pqParseInput3(PGconn *conn)
else
{
conn->pipelineStatus = 
PQ_PIPELINE_ON;
+   
pqCommandQueueAdvance(conn);
conn->asyncStatus = 
PGASYNC_READY;
}
}
@@ -257,6 +262,7 @@ pqParseInput3(PGconn *conn)
pqSaveErrorResult(conn);
}
}
+   /* XXX should we advance the command 
queue here? */
conn->asyncStatus = PGASYNC_READY;
break;
case '1':   /* Parse Complete */
@@ -274,6 +280,7 @@ pqParseInput3(PGconn *conn)

pqSaveErrorResult(conn);
}
}
+

Re: HOT chain validation in verify_heapam()

2023-01-30 Thread Himanshu Upadhyaya
Hi Hackers,

On Sun, Jan 22, 2023 at 8:48 PM Himanshu Upadhyaya <
upadhyaya.himan...@gmail.com> wrote:

>
> The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin))
>> seems wrong to me. Shouldn't it be &&? Has this code been tested at
>> all?  It doesn't seem to have a test case. Some of these other errors
>> don't, either. Maybe there's some that we can't easily test in an
>> automated way, but we should test what we can. I guess maybe casual
>> testing wouldn't reveal the problem here because of the recheck, but
>> it's worrying to find logic that doesn't look right with no
>> corresponding comments or test cases.
>>
>> This is totally my Mistake, apologies for that. I will fix this in my
> next patch. Regarding the missing test cases, I need one in-progress
> transaction for these test cases to be included in 004_verify_heapam.pl
> but I don't find a clear way to have an in-progress transaction(as per the
> design of 004_verify_heapam.pl ) that I can use in the test cases. I will
> be doing more research on a solution to add these missing test cases.
>
>>
>> I am trying to add test cases related to in-progress transactions in
004_verify_heapam.pl but I am not able to find a proper way to achieve
this.
We have a logic where we manually corrupt each tuple.
Please refer to the code just after the below comment in
004_verify_heapam.pl

"# Corrupt the tuples, one type of corruption per tuple.  Some types of
# corruption cause verify_heapam to skip to the next tuple without
# performing any remaining checks, so we can't exercise the system properly
if
# we focus all our corruption on a single tuple."

Before this we stop the node by "$node->stop;" and then only we progress to
manual corruption. This will abort all running/in-progress transactions.
So, if we create an in-progress transaction and comment "$node->stop;"
then somehow all the code that we have for manual corruption does not work.

I think it is required to stop the server and then only proceed for manual
corruption?
If this is the case then please suggest if there is a way to get an
in-progress transaction
that we can use for manual corruption.
-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-30 Thread Takamichi Osumi (Fujitsu)
On Monday, January 30, 2023 7:05 PM I wrote:
> On Saturday, January 28, 2023 1:28 PM I wrote:
> > Attached the updated patch v24.
> I've conducted the rebase affected by the commit(1e8b61735c) by renaming
> the GUC to logical_replication_mode accordingly, because it's utilized in the
> TAP test of this time-delayed LR feature.
> There is no other change for this version.
> 
> Kindly have a look at the attached v25.
Hi,

The v25 caused a failure on windows of cfbot in [1].
But, the failure happened in the tests of pg_upgrade
and the failure message looks the same one reported in the ongoing discussion 
of [2].
Then, it's an issue independent from the v25.

[1] - https://cirrus-ci.com/task/5484559622471680
[2] - 
https://www.postgresql.org/message-id/20220919213217.ptqfdlcc5idk5xup%40awork3.anarazel.de


Best Regards,
Takamichi Osumi





Re: Amcheck verification of GiST and GIN

2023-01-30 Thread Aleksander Alekseev
Hi Andrey,

> Thanks! I also found out that there was a CI complaint about amcheck.h
> not including some necessary stuff. Here's a version with a fix for
> that.

Thanks for the updated patchset.

One little nitpick I have is that the tests cover only cases when all
the checks pass successfully. The tests don't show that the checks
will fail if the indexes are corrupted. Usually we check this as well,
see bd807be6 and other amcheck replated patches and commits.

-- 
Best regards,
Aleksander Alekseev




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-30 Thread Masahiko Sawada
On Mon, Jan 30, 2023 at 3:23 PM houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 30, 2023 12:13 PM Peter Smith  
> wrote:
> >
> > Here are my review comments for v88-0002.
>
> Thanks for your comments.
>
> >
> > ==
> > General
> >
> > 1.
> > The test cases are checking the log content but they are not checking for
> > debug logs or untranslated elogs -- they are expecting a normal ereport LOG
> > that might be translated. I’m not sure if that is OK, or if it is a 
> > potential problem.
>
> We have tests that check the ereport ERROR and ereport WARNING message(by
> search for the ERROR or WARNING keyword for all the tap tests), so I think
> checking the LOG should be fine.
>
> > ==
> > doc/src/sgml/config.sgml
> >
> > 2.
> > On the publisher side, logical_replication_mode allows allows streaming or
> > serializing changes immediately in logical decoding. When set to immediate,
> > stream each change if streaming option (see optional parameters set by
> > CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set
> > to buffered, the decoding will stream or serialize changes when
> > logical_decoding_work_mem is reached.
> >
> > 2a.
> > typo "allows allows"  (Kuroda-san reported same)
> >
> > 2b.
> > "if streaming option" --> "if the streaming option"
>
> Changed.
>
> > ~~~
> >
> > 3.
> > On the subscriber side, if streaming option is set to parallel,
> > logical_replication_mode also allows the leader apply worker to send changes
> > to the shared memory queue or to serialize changes.
> >
> > SUGGESTION
> > On the subscriber side, if the streaming option is set to parallel,
> > logical_replication_mode can be used to direct the leader apply worker to
> > send changes to the shared memory queue or to serialize changes.
>
> Changed.
>
> > ==
> > src/backend/utils/misc/guc_tables.c
> >
> > 4.
> >   {
> >   {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> > - gettext_noop("Controls when to replicate each change."),
> > - gettext_noop("On the publisher, it allows streaming or serializing each
> > change in logical decoding."),
> > + gettext_noop("Controls the internal behavior of logical replication
> > publisher and subscriber"),
> > + gettext_noop("On the publisher, it allows streaming or "
> > + "serializing each change in logical decoding. On the "
> > + "subscriber, in parallel streaming mode, it allows "
> > + "the leader apply worker to serialize changes to "
> > + "files and notifies the parallel apply workers to "
> > + "read and apply them at the end of the transaction."),
> >   GUC_NOT_IN_SAMPLE
> >   },
> > Suggest re-wording the long description (subscriber part) to be more like 
> > the
> > documentation text.
> >
> > BEFORE
> > On the subscriber, in parallel streaming mode, it allows the leader apply 
> > worker
> > to serialize changes to files and notifies the parallel apply workers to 
> > read and
> > apply them at the end of the transaction.
> >
> > SUGGESTION
> > On the subscriber, if the streaming option is set to parallel, it directs 
> > the leader
> > apply worker to send changes to the shared memory queue or to serialize
> > changes and apply them at the end of the transaction.
> >
>
> Changed.
>
> Attach the new version patch which addressed all comments so far (the v88-0001
> has been committed, so we only have one remaining patch this time).
>

I have one comment on v89 patch:

+   /*
+* Using 'immediate' mode returns false to cause a switch to
+* PARTIAL_SERIALIZE mode so that the remaining changes will
be serialized.
+*/
+   if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE)
+   return false;
+

Probably we might want to add unlikely() here since we could pass
through this path very frequently?

The rest looks good to me.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-30 Thread Masahiko Sawada
On Mon, Jan 30, 2023 at 1:31 PM John Naylor
 wrote:
>
> On Sun, Jan 29, 2023 at 9:50 PM Masahiko Sawada  wrote:
> >
> > On Sat, Jan 28, 2023 at 8:33 PM John Naylor
> >  wrote:
>
> > > The first implementation should be simple, easy to test/verify, easy to 
> > > understand, and easy to replace. As much as possible anyway.
> >
> > Yes, but if a concurrent writer waits for another process to finish
> > the iteration, it ends up waiting on a lwlock, which is not
> > interruptible.
> >
> > >
> > > > So the idea is that we set iter_active to true (with the
> > > > lock in exclusive mode), and prevent concurrent updates when the flag
> > > > is true.
> > >
> > > ...by throwing elog(ERROR)? I'm not so sure users of this API would 
> > > prefer that to waiting.
> >
> > Right. I think if we want to wait rather than an ERROR, the waiter
> > should wait in an interruptible way, for example, a condition
> > variable. I did a simpler way in the v22 patch.
> >
> > ...but looking at dshash.c, dshash_seq_next() seems to return an entry
> > while holding a lwlock on the partition. My assumption might be wrong.
>
> Using partitions there makes holding a lock less painful on average, I 
> imagine, but I don't know the details there.
>
> If we make it clear that the first committed version is not (yet) designed 
> for high concurrency with mixed read-write workloads, I think waiting (as a 
> protocol) is fine. If waiting is a problem for some use case, at that point 
> we should just go all the way and replace the locking entirely. In fact, it 
> might be good to spell this out in the top-level comment and include a link 
> to the second ART paper.

Agreed. Will update the comments.

>
> > > [thinks some more...] Is there an API-level assumption that hasn't been 
> > > spelled out? Would it help to have a parameter for whether the iteration 
> > > function wants to reserve the privilege to perform writes? It could take 
> > > the appropriate lock at the start, and there could then be multiple 
> > > read-only iterators, but only one read/write iterator. Note, I'm just 
> > > guessing here, and I don't want to make things more difficult for future 
> > > improvements.
> >
> > Seems a good idea. Given the use case for parallel heap vacuum, it
> > would be a good idea to support having multiple read-only writers. The
> > iteration of the v22 is read-only, so if we want to support read-write
> > iterator, we would need to support a function that modifies the
> > current key-value returned by the iteration.
>
> Okay, so updating during iteration is not currently supported. It could in 
> the future, but I'd say that can also wait for fine-grained concurrency 
> support. Intermediate-term, we should at least make it straightforward to 
> support:
>
> 1) parallel heap vacuum  -> multiple read-only iterators
> 2) parallel heap pruning -> multiple writers
>
> It may or may not be worth it for someone to actually start either of those 
> projects, and there are other ways to improve vacuum that may be more 
> pressing. That said, it seems the tid store with global locking would 
> certainly work fine for #1 and maybe "not too bad" for #2. #2 can also 
> mitigate waiting by using larger batching, or the leader process could 
> "pre-warm" the tid store with zero-values using block numbers from the 
> visibility map.

True. Using a larger batching method seems to be worth testing when we
implement the parallel heap pruning.

In the next version patch, I'm going to update the locking support
part and incorporate other comments I got.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: cutting down the TODO list thread

2023-01-30 Thread Bruce Momjian
On Mon, Jan 30, 2023 at 01:13:45PM +0700, John Naylor wrote:
> On Tue, Jan 24, 2023 at 11:57 PM Bruce Momjian  wrote:
> > I think we just point them at the TODO list and they will read the top
> > of the list first, no?  I think you are right that we updated the top of
> > the TODO but didn't update the places that link to it.  I am thinking we
> > should just trim down the text linking to it and let the top of the TODO
> > list do its job.
> 
> Okay. How about:
> 
> "It's worth checking if the feature of interest is found in the TODO list on
> our wiki: http://wiki.postgresql.org/wiki/TODO. The entries there often have
> additional information about the feature and may point to reasons why it 
> hasn't
> been implemented yet."

Good.

> > Wow, I would not send a new person to the SQL standard docs.  ;-)  I am
> > thinking we just don't have a good answer to this so let's say less.
> 
> Do I understand right that we could just remove this entire section "What 
> areas
> need work?"?

Yes, I think so.

> > > 2)
> > > from:
> > > "What do I do after choosing an item to work on?
> > >
> > > Send an email to pgsql-hackers with a proposal for what you want to do
> > > (assuming your contribution is not trivial). Working in isolation is not
> > > advisable because others might be working on the same TODO item, or you
> might
> > > have misunderstood the TODO item. In the email, discuss both the internal
> > > implementation method you plan to use, and any user-visible changes (new
> > > syntax, etc)."
> > >
> > > to:
> > > "What do I do after choosing an area to work on?
> > >
> > > Send an email to pgsql-hackers with a proposal for what you want to do
> > > (assuming your contribution is not trivial). Working in isolation is not
> >
> > Can new people identify trivial?
> 
> I'd say they have some idea about that, since we do regularly get typo fixes
> and doc clarifications. Sure there is some grey area, but I don't think the
> dividing point is important. The important thing is, we also sometimes get
> large and invasive patches without design discussion, which we want to
> discourage.

Agreed.

> > I can now see that just removing the [E] label totally is the right
> > answer.  Yes, there might be an easy item on there, but the fact we have
> > three labeled and they are not easy makes me thing [E] is causing more
> > problems than it solves.
> 
> Okay, having heard no objections I'll remove it.

Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Non-superuser subscription owners

2023-01-30 Thread Robert Haas
On Fri, Jan 27, 2023 at 5:56 PM Mark Dilger
 wrote:
> If the owner cannot modify the subscription, then the owner degenerates into 
> a mere "run-as" user.  Note that this isn't how things work now, and even if 
> we disallowed owners from modifying the connection string, there would still 
> be other attributes the owner could modify, such as the set of publications 
> subscribed.

The proposed patch blocks every form of ALTER SUBSCRIPTION if
password_required false is set and you aren't a superuser. Is there
some other DML command that could be used to modify the set of
publications subscribed?

> More generally, my thinking on this thread is that there needs to be two 
> nosuperuser roles:  A higher privileged role which can create a subscription, 
> and a lower privileged role serving the "run-as" function.  Those shouldn't 
> be the same, because the "run-as" concept doesn't logically need to have 
> subscription creation power, and likely *shouldn't* have that power.  
> Depending on which sorts of attributes a subscription object has, such as the 
> connection string, the answer differs for whether the owner/"run-as" user 
> should get to change those attributes.  One advantage of Jeff's idea of using 
> a server object rather than a string is that it becomes more plausibly safe 
> to allow the subscription owner to make changes to that attribute of the 
> subscription.

There's some question in my mind about what these different mechanisms
are intended to accomplish.

On a technical level, I think that the idea of having a separate
objection for the connection string vs. the subscription itself is
perfectly sound, and to repeat what I said earlier, if someone wants
to implement that, cool. I also agree that it has the advantage that
you specify, namely, that someone can have rights to modify one of
those objects but not the other. What that lets you do is define a
short list of known systems and say, hey, you can replicate whatever
tables you want with whatever options you want, but only between these
systems. I'm not quite sure what problem that solves, though.

>From my point of view, the two things that the superuser is most
likely to want to do are (1) control the replication setup themselves
and delegate nothing to any non-superuser or (2) give a non-superuser
pretty much complete control over replication with just enough
restrictions to avoid letting them do things that would compromise
security, such as hacking the local superuser account. In other words,
I expect that delegation of the logical replication configuration is
usually going to be all or nothing. Jeff's system allows for a
situation where you want to delegate some stuff but not everything,
and specifically where you want to dedicate control over the
subscription options and the tables being replicated, but not the
connection strings. To me, that feels like a bit of an awkward
configuration; I don't really understand in what situation that
division of responsibility would be particularly useful. I trust that
Jeff is proposing it because he knows of such a situation, but I don't
know what it is. I feel like, even if I wanted to let people use some
connection strings and not others, I'd probably want that control in
some form other than listing a specific list of allowable connection
strings -- I'd want to say things like "you have to use SSL" or "no
connecting back to the local host," because that lets me enforce some
general organizational policy without having to care specifically
about how each subscription is being set up.

Unfortunately, I have even less of an idea about what the run-as
concept is supposed to accomplish. I mean, at one level, I see it
quite clearly: the user creating the subscription wants replication to
have restricted privileges when it's running, and so they make the
run-as user some role with fewer privileges than their own. Brilliant.
But then I get stuck: against what kind of attack does that actually
protect us? If I'm a high privilege user, perhaps even a superuser,
and it's not safe to have logical replication running as me, then it
seems like the security model of logical replication is fundamentally
busted and we need to fix that. It can't be right to say that if you
have 263 users in a database and you want to replicate the whole
database to some other node, you need 263 different subscriptions with
a different run-as user for each. You need to be able to run all of
that logical replication as the superuser or some other high-privilege
user and not end up with a security compromise. And if we suppose that
that already works and is safe, well then what's the case where I do
need a run-as user?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: JSONPath Child Operator?

2023-01-30 Thread David E . Wheeler
On Jan 30, 2023, at 08:17, Filipp Krylov  wrote:

>> My question: Are there plans to support square bracket syntax for JSON 
>> object field name strings like this? Or to update to follow the standard as 
>> it’s finalized?
> 
> This syntax is a part of "jsonpath syntax extensions" patchset: 
> https://www.postgresql.org/message-id/e0fe4f7b-da0b-471c-b3da-d8adaf314357%40postgrespro.ru

Nice, thanks. I learned since sending this email that SQL/JSON Path is not at 
all the same as plain JSON Path, so now I’m less concerned bout it. I like the 
new object subscript syntax, though, I’ve been thinking about this myself.

D





Re: Timeline ID hexadecimal format

2023-01-30 Thread Sébastien Lardière

On 27/01/2023 15:55, Peter Eisentraut wrote:

On 27.01.23 14:52, Sébastien Lardière wrote:
The attached patch proposes to change the format of timelineid from 
%u to %X.


I think your complaint has merit.  But note that if we did a change 
like this, then log files or reports from different versions would 
have different meaning without a visual difference, which is kind of 
what you complained about in the first place.  At least we should do 
something like 0x%X.



Hi,

Here's the patch with the suggested format ; plus, I add some note in 
the documentation about recovery_target_timeline, because I don't get 
how strtoul(), with the special 0 base parameter can work without 0x 
prefix ; I suppose that nobody use it.


I also change pg_controldata and the usage of this output by pg_upgrade. 
I let internal usages unchanded : content of backup manifest and content 
of history file.


Should I open a commitfest entry, or is it too soon ?

regards,

--
Sébastien
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..7e26b51031 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
 you like, add comments to a history file to record your own notes about
 how and why this particular timeline was created.  Such comments will be
 especially valuable when you have a thicket of different timelines as
-a result of experimentation.
+a result of experimentation. In both WAL segment file names and history files,
+the timeline ID number is expressed in hexadecimal.

 

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1cf53c74ea..508774cfee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server. A numerical value expressed in hexadecimal must be
+prefixed with 0x, for example 0x11.
+latest is the default.

 

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index f390c177e4..bdbe993877 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		CheckPoint *checkpoint = (CheckPoint *) rec;
 
 		appendStringInfo(buf, "redo %X/%X; "
-		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
+		 "tli 0x%X; prev tli 0x%X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
 		 "oldest running xid %u; %s",
@@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		xl_end_of_recovery xlrec;
 
 		memcpy(&xlrec, rec, sizeof(xl_end_of_recovery));
-		appendStringInfo(buf, "tli %u; prev tli %u; time %s",
+		appendStringInfo(buf, "tli 0x%X; prev tli 0x%X; time %s",
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..c22cf4b2a1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7819,7 +7819,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (checkPoint.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in shutdown checkpoint record",
 			checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
@@ -7906,7 +7906,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (xlrec.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in end-of-recovery record",
 			xlrec.ThisTimeLineID, replayTLI)));
 	}
 	else if (info == XLOG_NOOP)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
i

Re: Non-superuser subscription owners

2023-01-30 Thread Mark Dilger



> On Jan 30, 2023, at 7:44 AM, Robert Haas  wrote:
> 
> And if we suppose that
> that already works and is safe, well then what's the case where I do
> need a run-as user?

A) Alice publishes tables, and occasionally adds new tables to existing 
publications.

B) Bob manages subscriptions, and periodically runs "refresh publication".  Bob 
also creates new subscriptions for people when a row is inserted into the 
"please create a subscription for me" table which Bob owns, using a trigger 
that Bob created on that table.

C) Alice creates a "please create a subscription for me" table on the 
publishing database, adds lots of malicious requests, and adds that table to 
the publication.

D) Bob replicates the table, fires the trigger, creates the malicious 
subscriptions, and starts replicating all that stuff, too.

I think that having Charlie, not Bob, as the "run-as" user helps somewhere 
right around (D). 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Tom Lane
Aleksander Alekseev  writes:
>> I've found a solution:
>> 
>> ```
>> export SGML_CATALOG_FILES=/usr/local/etc/xml/catalog
>> export XMLLINT="xmllint --catalogs"
>> export XSLTPROC="xsltproc --catalogs"
>> ```

Hmm, there is no such directory on my Mac, and indeed this recipe
does not work here.  I tried to transpose it to MacPorts by
substituting /opt/local/etc/xml/catalog, which does exist --- but
the recipe still doesn't work.

I believe what is actually failing is that http://docbook.sourceforge.net
now redirects to https:, and the ancient xsltproc version provided by
Apple doesn't do https.  What you need to do if you want to use their
xsltproc is install a local copy of the SGML catalog files and
stylesheets, preferably in the place that xsltproc would look by default
(/etc/xml/catalog seems to be the standard one).  It would be good to
document how to do that, but this patch doesn't do so.

What we do actually have already is a recommendation to install
appropriate MacPorts or Homebrew packages:

https://www.postgresql.org/docs/devel/docguide-toolsets.html#DOCGUIDE-TOOLSETS-INST-MACOS

and it works okay for me as long as I use MacPorts' version of xsltproc.

regards, tom lane




Re: Making Vars outer-join aware

2023-01-30 Thread Tom Lane
Richard Guo  writes:
> Sorry for the delayed reply.  I don't have any more review comments at
> the moment, except a nitpicking one.

> In optimizer/README at line 729 there is a query as

> SELECT * FROM a
>   LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
> WHERE a.x = 1;

> I think it should be

> SELECT * FROM a
>   LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
> WHERE a.x = 1;

Oh, good catch, thanks.

> I have no objection to get it committed.

I'll push forward then.  Thanks for reviewing!

regards, tom lane




Re: pub/sub - specifying optional parameters without values.

2023-01-30 Thread Tom Lane
Peter Smith  writes:
> The v3 patch LGTM (just for the logical replication commands).

Pushed then.

regards, tom lane




Re: Deadlock between logrep apply worker and tablesync worker

2023-01-30 Thread vignesh C
On Mon, 30 Jan 2023 at 17:30, vignesh C  wrote:
>
> On Mon, 30 Jan 2023 at 13:00, houzj.f...@fujitsu.com
>  wrote:
> >
> > On Monday, January 30, 2023 2:32 PM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
> > > >
> > > > On Sat, 28 Jan 2023 at 11:26, Amit Kapila  
> > > > wrote:
> > > > >
> > > > > One thing that looks a bit odd is that we will anyway have a similar
> > > > > check in replorigin_drop_guts() which is a static function and
> > > > > called from only one place, so, will it be required to check at both 
> > > > > places?
> > > >
> > > > There is a possibility that the initial check to verify if replication
> > > > origin exists in replorigin_drop_by_name was successful but later one
> > > > of either table sync worker or apply worker process might have dropped
> > > > the replication origin,
> > > >
> > >
> > > Won't locking on the particular origin prevent concurrent drops? IIUC, the
> > > drop happens after the patch acquires the lock on the origin.
> >
> > Yes, I think the existence check in replorigin_drop_guts is unnecessary as 
> > we
> > already lock the origin before that. I think the check in 
> > replorigin_drop_guts
> > is a custom check after calling SearchSysCache1 to get the tuple, but the 
> > error
> > should not happen as no concurrent drop can be performed.
> >
> > To make it simpler, one idea is to move the code that getting the tuple from
> > system cache to the replorigin_drop_by_name(). After locking the origin, we
> > can try to get the tuple and do the existence check, and we can reuse
> > this tuple to perform origin delete. In this approach we only need to check
> > origin existence once after locking. BTW, if we do this, then we'd better 
> > rename the
> > replorigin_drop_guts() to something like replorigin_state_clear() as the 
> > function
> > only clear the in-memory information after that.
> >
> > The code could be like:
> >
> > ---
> > replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
> > ...
> > /*
> >  * Lock the origin to prevent concurrent drops. We keep the lock 
> > until the
> >  * end of transaction.
> >  */
> > LockSharedObject(ReplicationOriginRelationId, roident, 0,
> >  AccessExclusiveLock);
> >
> > tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
> > if (!HeapTupleIsValid(tuple))
> > {
> > if (!missing_ok)
> > elog(ERROR, "cache lookup failed for replication 
> > origin with ID %d",
> >  roident);
> >
> > return;
> > }
> >
> > replorigin_state_clear(rel, roident, nowait);
> >
> > /*
> >  * Now, we can delete the catalog entry.
> >  */
> > CatalogTupleDelete(rel, &tuple->t_self);
> > ReleaseSysCache(tuple);
> >
> > CommandCounterIncrement();
>
> The attached updated patch has the changes to handle the same.

I had not merged one of the local changes that was present, please
find the updated patch including that change. Sorry for missing that
change.

Regards,
Vignesh
From 5ed769a1ed3b25e0a19ad4b235df8a4140870635 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 27 Jan 2023 15:17:09 +0530
Subject: [PATCH v3] Lock the replication origin record instead of locking the
 pg_replication_origin relation.

Lock the replication origin record instead of locking the
pg_replication_origin relation.
---
 src/backend/replication/logical/origin.c | 57 
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index b754c43840..5837818ecf 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -338,16 +338,14 @@ replorigin_create(const char *roname)
  * Helper function to drop a replication origin.
  */
 static void
-replorigin_drop_guts(Relation rel, RepOriginId roident, bool nowait)
+replorigin_state_clear(RepOriginId roident, bool nowait)
 {
-	HeapTuple	tuple;
 	int			i;
 
 	/*
-	 * First, clean up the slot state info, if there is any matching slot.
+	 * Clean up the slot state info, if there is any matching slot.
 	 */
 restart:
-	tuple = NULL;
 	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
 	for (i = 0; i < max_replication_slots; i++)
@@ -402,19 +400,6 @@ restart:
 	}
 	LWLockRelease(ReplicationOriginLock);
 	ConditionVariableCancelSleep();
-
-	/*
-	 * Now, we can delete the catalog entry.
-	 */
-	tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "cache lookup failed for replication origin with ID %d",
-			 roident);
-
-	CatalogTupleDelete(rel, &tuple->t_self);
-	ReleaseSysCache(tuple);
-
-	CommandCounterIncrement();
 }
 
 /*
@@ -427,24 +412,38 @@ replorigin_drop_by_name(const char *name, boo

Re: Optimizing PostgreSQL with LLVM's PGO+LTO

2023-01-30 Thread João Paulo Labegalini de Carvalho
> What compiler / version / flags / OS did you try?
>

I am running experiment on a machine with:

   - Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
   - Ubuntu 18.04.6 LTS
   - LLVM/Clang 15.0.6 (build from source)

These are the flags I am using:

CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate
LDFLAGS = -fuse-ld=lld -Wl,-q


FWIW, I've experimented with LTO and PGO a bunch, both with gcc and clang. I
> did hit a crash in gcc, but that did turn out to be a compiler bug, and
> actually reduced to something not even needing LTO.
>

Good to hear that it works. I just need to figure out what is going wrong
on my end then.


> I saw quite substantial speedups with PGO, but I only tested very specific
> workloads. IIRC it was >15% gain in concurrent readonly pgbench.
>

I successfully applied PGO only and obtained similar gains with TPC-C &
TPC-H workloads.

I dimly recall failing to get some benefit out of bolt for some reason that
> I
> unfortunately don't even vaguely recall.
>

I got similar gains slightly higher than PGO with BOLT, but not for all
queries in TPC-H. In fact, I observed small (2-4%) regressions with BOLT.

-- 
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carva...@ic.unicamp.br
joao.carva...@ualberta.ca


Re: Non-superuser subscription owners

2023-01-30 Thread Robert Haas
On Mon, Jan 30, 2023 at 11:11 AM Mark Dilger
 wrote:
> > On Jan 30, 2023, at 7:44 AM, Robert Haas  wrote:
> >
> > And if we suppose that
> > that already works and is safe, well then what's the case where I do
> > need a run-as user?
>
> A) Alice publishes tables, and occasionally adds new tables to existing 
> publications.
>
> B) Bob manages subscriptions, and periodically runs "refresh publication".  
> Bob also creates new subscriptions for people when a row is inserted into the 
> "please create a subscription for me" table which Bob owns, using a trigger 
> that Bob created on that table.
>
> C) Alice creates a "please create a subscription for me" table on the 
> publishing database, adds lots of malicious requests, and adds that table to 
> the publication.
>
> D) Bob replicates the table, fires the trigger, creates the malicious 
> subscriptions, and starts replicating all that stuff, too.
>
> I think that having Charlie, not Bob, as the "run-as" user helps somewhere 
> right around (D).

I suppose it does, but I have some complaints.

First, it doesn't seem to make a lot of sense to have one person
managing the publications and someone else managing the subscriptions,
and especially if those parties are mutually untrusting. I can't think
of any real reason to set things up that way. Sure, you could, but why
would you? You could, equally, decide that one member of your
household was going to decide what's for dinner every night, and some
other member of your household was going to decide what gets purchased
at the grocery store each week. If those two people exercise their
responsibilities without tight coordination, or with hostile intent
toward each other, things are going to go badly, but that's not an
argument for putting a combination lock on the flour canister. It's an
argument for getting along better, or not having such a dumb system in
the first place. I don't quite see how the situation you postulate in
(A) and (B) is any different. Publications and subscriptions are as
closely connected as food purchases and meals. The point of a
publication is for it to connect up to a subscription. In what
circumstances would be it be reasonable to give responsibility for
those objects to different and especially mutually untrusting users?

Second, in step (B), we may ask why Bob is doing this with a trigger.
If he's willing to create any subscription for which Alice asks, we
could have just given Alice the authority to do those actions herself.
Presumably, therefore, Bob is willing to create some subscriptions for
which Alice may ask and not others. Perhaps this whole arrangement is
just a workaround for the lack of a sensible system for controlling
which connection strings Alice can use, in which case what is really
needed here might be something like the separate connection object
which Jeff postulated or my idea of a reverse pg_hba.conf. That kind
of approach would give a better user interface to Alice, who wouldn't
have to rephrase all of her CREATE SUBSCRIPTION commands as insert
statements. Conversely, if Alice and Bob are truly dedicated to this
convoluted system of creating subscriptions, then Bob needs to put
logic into his trigger that's smart enough to block any malicious
requests that Alice may make. He really brought this problem on
himself by not doing that.

Third, in step (C), it seems to me that whoever set up Alice's
permissions has really messed up. Either the schema Bob is using for
his create-me-a-subscription table exists on the primary and Alice has
permission to create tables in that schema, or else that schema does
not exist on the primary and Alice has permission to create it. Either
way, that's a bad setup. Bob's table should be located in a schema for
which Alice has only USAGE permissions and shouldn't have excess
permissions on the table, either. Then this step can't happen. This
step could also be blocked if, instead of using a table with a
trigger, Bob wrote a security definer function or procedure and
granted EXECUTE permission on that function or procedure to Alice.
He's still going to need sanity checks, though, and if the function or
procedure inserts into a logging table or something, he'd better make
sure that table is adequately secured rather than being, say, a table
owned by Alice with malicious triggers on it.

So basically this doesn't really feel like a valid scenario to me.
We're supposed to believe that Alice is hostile to Bob, but the
superuser doesn't seem to have thought very carefully about how Bob is
supposed to defend himself against Alice, and Bob doesn't even seem to
be trying. Maybe we should rename the users to Samson and Delilah? :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: meson: Optionally disable installation of test modules

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 08:37:42 +0100, Peter Eisentraut wrote:
> One open issue (IMO) with the meson build system is that it installs the
> test modules under src/test/modules/ as part of a normal installation. This
> is because there is no way to set up up the build system to install extra
> things only when told.  I think we still need a way to disable this somehow,
> so that building a production installation doesn't end up with a bunch of
> extra files.
> 
> The attached simple patch is a starting point for discussion.  It just
> disables the subdirectory src/test/modules/ based on some Boolean setting.
> This could be some new top-level option, or maybe part of PG_TEST_EXTRA, or
> something else?  With this, I get an identical set of installed files from
> meson.  I imagine this option would be false by default and developers would
> enable it.

Bilal, with a bit of help by me, worked on an alternative approach to
this. It's a lot more verbose in the initial change, but wouldn't increase the
amount of work/lines for new test modules. The main advantage is that we
wouldn't have disable the modules by default, which I think would be quite
likely to result in plenty people not running the tests.

Sending a link instead of attaching, in case you already registered a cfbot 
entry:
https://github.com/anarazel/postgres/commit/d1d192a860da39af9aa63d7edf643eed07c4

Probably worth adding an install-test-modules target for manual use.

Greetings,

Andres Freund




Re: Optimizing PostgreSQL with LLVM's PGO+LTO

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 10:24:02 -0700, João Paulo Labegalini de Carvalho wrote:
> > What compiler / version / flags / OS did you try?
> >
> 
> I am running experiment on a machine with:
> 
>- Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
>- Ubuntu 18.04.6 LTS
>- LLVM/Clang 15.0.6 (build from source)
> 
> These are the flags I am using:
> 
> CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate
> LDFLAGS = -fuse-ld=lld -Wl,-q

For some reason my notes for using LTO include changing RANLIB to point to
gcc/llvm-ranlib of the appropriate version. Won't even be used on HEAD, but
before that it can make a difference.

Depending on how you built clang, it could be that the above recipe ends up
using the system lld, which might be too old.

What are the crashes you're getting?

Greetings,

Andres Freund




Re: Add n_tup_newpage_upd to pg_stat table views

2023-01-30 Thread Corey Huinker
On Fri, Jan 27, 2023 at 6:55 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-27 18:23:39 -0500, Corey Huinker wrote:
> > This patch adds the n_tup_newpage_upd to all the table stat views.
> >
> > Just as we currently track HOT updates, it should be beneficial to track
> > updates where the new tuple cannot fit on the existing page and must go
> to
> > a different one.
>
> I like that idea.
>
>
> I wonder if it's quite detailed enough - we can be forced to do out-of-page
> updates because we actually are out of space, or because we reach the max
> number of line pointers we allow in a page. HOT pruning can't remove dead
> line
> pointers, so that doesn't necessarily help.
>

I must be missing something, I only see the check for running out of space,
not the check for exhausting line pointers. I agree dividing them would be
interesting.



> Similarly, it's a bit sad that we can't distinguish between the number of
> potential-HOT out-of-page updates and the other out-of-page updates. But
> that'd mean even more counters.
>

I wondered that too, but the combinations of "would have been HOT but not
no space" and "key update suggested not-HOT but it was id=id so today's
your lucky HOT" combinations started to get away from me.

I wondered if there was interest in knowing if the tuple had to get
TOASTed, and further wondered if we would be interested in the number of
updates that had to wait on a lock. Again, more counters.


Re: Add n_tup_newpage_upd to pg_stat table views

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 13:40:15 -0500, Corey Huinker wrote:
> I must be missing something, I only see the check for running out of space,
> not the check for exhausting line pointers. I agree dividing them would be
> interesting.

See PageGetHeapFreeSpace(), particularly the header comment and the
MaxHeapTuplesPerPage check.


> > Similarly, it's a bit sad that we can't distinguish between the number of
> > potential-HOT out-of-page updates and the other out-of-page updates. But
> > that'd mean even more counters.
>
> I wondered that too, but the combinations of "would have been HOT but not
> no space" and "key update suggested not-HOT but it was id=id so today's
> your lucky HOT" combinations started to get away from me.

Not sure I follow the second part. Are you just worried about explaining when
a HOT update is possible?


> I wondered if there was interest in knowing if the tuple had to get
> TOASTed, and further wondered if we would be interested in the number of
> updates that had to wait on a lock. Again, more counters.

Those seem a lot less actionable / related to the topic at hand to me.

Greetings,

Andres Freund




Re: Non-superuser subscription owners

2023-01-30 Thread Mark Dilger



> On Jan 30, 2023, at 9:26 AM, Robert Haas  wrote:
> 
> First, it doesn't seem to make a lot of sense to have one person
> managing the publications and someone else managing the subscriptions,
> and especially if those parties are mutually untrusting. I can't think
> of any real reason to set things up that way. Sure, you could, but why
> would you? You could, equally, decide that one member of your
> household was going to decide what's for dinner every night, and some
> other member of your household was going to decide what gets purchased
> at the grocery store each week. If those two people exercise their
> responsibilities without tight coordination, or with hostile intent
> toward each other, things are going to go badly, but that's not an
> argument for putting a combination lock on the flour canister. It's an
> argument for getting along better, or not having such a dumb system in
> the first place. I don't quite see how the situation you postulate in
> (A) and (B) is any different. Publications and subscriptions are as
> closely connected as food purchases and meals. The point of a
> publication is for it to connect up to a subscription.

I have a grim view of the requirement that publishers and subscribers trust 
each other.  Even when they do trust each other, they can firewall attacks by 
acting as if they do not.

> In what
> circumstances would be it be reasonable to give responsibility for
> those objects to different and especially mutually untrusting users?

When public repositories of data, such as the IANA whois database, publish 
their data via postgres publications.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Introduce "log_connection_stages" setting.

2023-01-30 Thread Sergey Dudoladov
Hello, hackers

Thank you for the reviews. I've modified the patch to incorporate your
suggestions:
+ flag bits are now used to encode different connection stages
+ failing tests are now fixed. It was not a keyword issue but rather
"check_log_connection_messages" not allocating memory properly
   in the special case log_connection_messages =  'all'
+ the GUC option is now only GUC_LIST_INPUT
+ typo fixes and line rewrapping in the docs

Regards,
Sergey
From 4bf99bccb4b278188dbc679f00d506cd35b025f5 Mon Sep 17 00:00:00 2001
From: Sergey Dudoladov 
Date: Tue, 8 Nov 2022 18:56:26 +0100
Subject: [PATCH] Introduce 'log_connection_messages'

This patch removes 'log_connections' and 'log_disconnections'
in favor of 'log_connection_messages', thereby introducing incompatibility

Author: Sergey Dudoladov

Reviewed-by:

Discussion:
https://www.postgresql.org/message-id/flat/CAA8Fd-qCB96uwfgMKrzfNs32mqqysi53yZFNVaRNJ6xDthZEgA%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 89 +--
 src/backend/commands/variable.c   | 77 
 src/backend/libpq/auth.c  |  5 +-
 src/backend/postmaster/postmaster.c   |  5 +-
 src/backend/tcop/postgres.c   | 11 ++-
 src/backend/utils/init/postinit.c |  2 +-
 src/backend/utils/misc/guc_tables.c   | 29 +++---
 src/backend/utils/misc/postgresql.conf.sample |  5 +-
 src/include/postgres.h|  9 ++
 src/include/postmaster/postmaster.h   |  1 -
 src/include/utils/guc_hooks.h |  2 +
 src/test/authentication/t/001_password.pl |  2 +-
 src/test/authentication/t/003_peer.pl |  2 +-
 src/test/kerberos/t/001_auth.pl   |  2 +-
 src/test/ldap/t/001_auth.pl   |  2 +-
 src/test/recovery/t/013_crash_restart.pl  |  2 +-
 src/test/recovery/t/022_crash_temp_files.pl   |  2 +-
 src/test/recovery/t/032_relfilenode_reuse.pl  |  2 +-
 src/test/ssl/t/SSL/Server.pm  |  2 +-
 src/tools/ci/pg_ci_base.conf  |  3 +-
 20 files changed, 181 insertions(+), 73 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1cf53c74ea..8b60e814e9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -140,7 +140,7 @@
  An example of what this file might look like is:
 
 # This is a comment
-log_connections = yes
+log_connection_messages = all
 log_destination = 'syslog'
 search_path = '"$user", public'
 shared_buffers = 128MB
@@ -335,7 +335,7 @@ UPDATE pg_settings SET setting = reset_val WHERE name = 'configuration_parameter
passed to the postgres command via the
-c command-line parameter.  For example,
 
-postgres -c log_connections=yes -c log_destination='syslog'
+postgres -c log_connection_messages=all -c log_destination='syslog'
 
Settings provided in this way override those set via
postgresql.conf or ALTER SYSTEM,
@@ -7085,23 +7085,74 @@ local0.*/var/log/postgresql
   
  
 
- 
-  log_connections (boolean)
+ 
+  log_connection_messages (string)
   
-   log_connections configuration parameter
+   log_connection_messages configuration parameter
   
   
   

-Causes each attempted connection to the server to be logged,
-as well as successful completion of both client authentication (if
-necessary) and authorization.
+Causes stages of each attempted connection to the server to be logged. Example: authorized,disconnected.
+The default is the empty string meaning nothing is logged.
 Only superusers and users with the appropriate SET
 privilege can change this parameter at session start,
 and it cannot be changed at all within a session.
-The default is off.

 
+
+ Connection messages
+ 
+  
+  
+  
+   
+Name
+Description
+   
+  
+  
+   
+received
+Logs receipt of a connection. At this point a connection has been
+received, but no further work has been done: Postgres is about to start
+interacting with a client.
+   
+
+   
+authenticated
+Logs the original identity that an authentication method employs
+to identify a user. In most cases, the identity string matches the
+PostgreSQL username, but some third-party authentication methods may
+alter the original user identifier before the server stores it. Failed
+authentication is always logged regardless of the value of this setting.
+
+   
+
+   
+authorized
+Logs successful completion of authorization. At this point the
+connection has been fully established.
+
+   
+
+   
+disconnected
+Logs session termination. The log output provides information
+similar to authorized, plus the duration of the 

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Aleksander Alekseev
Hi Tom,

Thanks for the feedback.

> Hmm, there is no such directory on my Mac, and indeed this recipe
> does not work here.  I tried to transpose it to MacPorts by
> substituting /opt/local/etc/xml/catalog, which does exist --- but
> the recipe still doesn't work.

Well, that's a bummer.

> What we do actually have already is a recommendation to install
> appropriate MacPorts or Homebrew packages:
>
> https://www.postgresql.org/docs/devel/docguide-toolsets.html#DOCGUIDE-TOOLSETS-INST-MACOS
>
> and it works okay for me as long as I use MacPorts' version of xsltproc.

Unfortunately it doesn't work for Homebrew anymore and there seems to
be only one xsltproc in the system.

> I believe what is actually failing is that http://docbook.sourceforge.net
> now redirects to https:, and the ancient xsltproc version provided by
> Apple doesn't do https.  What you need to do if you want to use their
> xsltproc is install a local copy of the SGML catalog files and
> stylesheets, preferably in the place that xsltproc would look by default
> (/etc/xml/catalog seems to be the standard one).  It would be good to
> document how to do that, but this patch doesn't do so.

Fair enough.

I would appreciate it if you could help figuring out how to do this
for MacPorts, since I'm not a MacPorts user. I'll figure out how to do
this for Homebrew.

Does something like:

```
ln -s /opt/local/etc/xml/catalog /etc/xml/catalog
```

... work for you? Does your:

```
xsltproc --help
```

... also say that it uses /etc/xml/catalog path by default?

-- 
Best regards,
Aleksander Alekseev




Re: Non-superuser subscription owners

2023-01-30 Thread Mark Dilger



> On Jan 30, 2023, at 9:26 AM, Robert Haas  wrote:
> 
> So basically this doesn't really feel like a valid scenario to me.
> We're supposed to believe that Alice is hostile to Bob, but the
> superuser doesn't seem to have thought very carefully about how Bob is
> supposed to defend himself against Alice, and Bob doesn't even seem to
> be trying. Maybe we should rename the users to Samson and Delilah? :-)

No, Atahualpa and Pizarro.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-30 Thread Nathan Bossart
On Mon, Jan 23, 2023 at 07:28:06PM -0800, Andres Freund wrote:
> After a tiny bit further polishing, and after separately pushing a resource
> leak fix for walrcv_connect(), I pushed this.

My colleague Robins Tharakan (CC'd) noticed crashes when testing recent
commits, and he traced it back to e460248.  From this information, I
discovered the following typo:

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 8982d623d3..78a8bcee6e 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -321,7 +321,7 @@ dblink_connect(PG_FUNCTION_ARGS)
 else
 {
 if (pconn->conn)
-libpqsrv_disconnect(conn);
+libpqsrv_disconnect(pconn->conn);
 pconn->conn = conn;
 }

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Non-superuser subscription owners

2023-01-30 Thread Robert Haas
On Mon, Jan 30, 2023 at 1:46 PM Mark Dilger
 wrote:
> I have a grim view of the requirement that publishers and subscribers trust 
> each other.  Even when they do trust each other, they can firewall attacks by 
> acting as if they do not.

I think it's OK if the CREATE PUBLICATION user doesn't particularly
trust the CREATE SUBSCRIPTION user, because the publication is just a
grouping of tables to which somebody can pay attention or not. The
CREATE PUBLICATION user isn't compromised either way. But, at least as
things stand, I don't see how the CREATE SUBSCRIPTION user get away
with not trusting the CREATE PUBLICATION user. CREATE SUBSCRIPTION
provides no tools at all for filtering the data that the subscriber
chooses to send.

Now that can be changed, I suppose, and a run-as user would be one way
to make progress in that direction. But I'm not sure how viable that
is, because...

> > In what
> > circumstances would be it be reasonable to give responsibility for
> > those objects to different and especially mutually untrusting users?
>
> When public repositories of data, such as the IANA whois database, publish 
> their data via postgres publications.

... for that to work, IANA would need to set up the database so that
untrusted parties can create logical replication slots on their
PostgreSQL server. And I think that granting REPLICATION privilege on
your database to random people on the Internet is not really viable,
nor intended to be viable.  As the CREATE ROLE documentation says, "A
role having the REPLICATION attribute is a very highly privileged
role."

Concretely, this kind of setup would have the problem that you could
kill the IANA database by just creating a replication slot and then
not using it (or replicating from it only very very slowly).
Eventually, the replication slot would either hold back xmin enough
that you got a lot of bloat, or cause enough WAL to be retained that
you ran out of disk space. Maybe you could protect yourself against
that kind of problem by cutting off users who get too far behind, but
that also cuts off people who just have an outage for longer than your
cutoff.

Also, anyone who can connection to a replication slot can also connect
to any other replication slot, and drop any replication slot. So if
IANA did grant REPLICATION privilege to random people on the Internet,
one of them could jump into the system and screw things up for all the
others.

This kind of setup just doesn't seem viable to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: recovery modules

2023-01-30 Thread Nathan Bossart
On Mon, Jan 30, 2023 at 04:51:38PM +0900, Michael Paquier wrote:
> Now, I find this part, where we use a double pointer to allow the
> module initialization to create and give back a private area, rather
> confusing, and I think that this could be bug-prone, as well.  Once
> you incorporate some data within the set of callbacks, isn't this
> stuff close to a "state" data, or just something that we could call
> only an "ArchiveModule"?  Could it make more sense to have
> _PG_archive_module_init return a structure with everything rather than
> a separate in/out argument?  Here is the idea, simply:
> typedef struct ArchiveModule {
>   ArchiveCallbacks *routines;
>   void *private_data;
>   /* Potentially more here, like some flags? */
> } ArchiveModule;

Yeah, we could probably invent an ArchiveModuleContext struct.  I think
this is similar to how LogicalDecodingContext is used.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Making background psql nicer to use in tap tests

2023-01-30 Thread Andres Freund
Hi,

Plenty tap tests require a background psql. But they're pretty annoying to
use.

I think the biggest improvement would be an easy way to run a single query and
get the result of that query. Manually having to pump_until() is awkward and
often leads to hangs/timeouts, instead of test failures, because one needs to
specify a match pattern to pump_until(), which on mismatch leads to trying to
keep pumping forever.

It's annoyingly hard to wait for the result of a query in a generic way with
background_psql(), and more generally for psql. background_psql() uses -XAtq,
which means that we'll not get "status" output (like "BEGIN" or "(1 row)"),
and that queries not returning anything are completely invisible.

A second annoyance is that issuing a query requires a trailing newline,
otherwise psql won't process it.


The best way I can see is to have a helper that issues the query, followed by
a trailing newline, an \echo with a recognizable separator, and then uses
pump_until() to wait for that separator.


Another area worthy of improvement is that background_psql() requires passing
in many variables externally - without a recognizable benefit afaict. What's
the point in 'stdin', 'stdout', 'timer' being passed in? stdin/stdout need to
point to empty strings, so we know what's needed - in fact we'll even reset
them if they're passed in.  The timer is always going to be
PostgreSQL::Test::Utils::timeout_default, so again, what's the point?

I think it'd be far more usable if we made background_psql() return a hash
with the relevant variables. The 031_recovery_conflict.pl test has:

my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
my %psql_standby = ('stdin' => '', 'stdout' => '');
$psql_standby{run} =
  $node_standby->background_psql($test_db, \$psql_standby{stdin},
\$psql_standby{stdout},
$psql_timeout);
$psql_standby{stdout} = '';

How about just returning a reference to a hash like that? Except that I'd also
make stderr available, which one can't currently access.


The $psql_standby{stdout} = ''; is needed because background_psql() leaves a
banner in the output, which it shouldn't, but we probably should just fix
that.


Brought to you by: Trying to write a test for vacuum_defer_cleanup_age.

- Andres




Re: recovery modules

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 16:51:38 +0900, Michael Paquier wrote:
> On Fri, Jan 27, 2023 at 10:27:29PM -0800, Nathan Bossart wrote:
> > Here is a work-in-progress patch set for adjusting the archive modules
> > interface.  Is this roughly what you had in mind?
> 
> I have been catching up with what is happening here.  I can get
> behind the idea to use the term "callbacks" vs "context" for clarity,
> to move the callback definitions into their own header, and to add
> extra arguments to the callback functions for some private data.
> 
> -void
> -_PG_archive_module_init(ArchiveModuleCallbacks *cb)
> +const ArchiveModuleCallbacks *
> +_PG_archive_module_init(void **arg)
>  {
> AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
>  
> -   cb->check_configured_cb = basic_archive_configured;
> -   cb->archive_file_cb = basic_archive_file;
> +   (*arg) = (void *) AllocSetContextCreate(TopMemoryContext,
> +   "basic_archive",
> +   ALLOCSET_DEFAULT_SIZES);
> +
> +   return &basic_archive_callbacks;

> Now, I find this part, where we use a double pointer to allow the
> module initialization to create and give back a private area, rather
> confusing, and I think that this could be bug-prone, as well.

I don't think _PG_archive_module_init() should actually allocate a memory
context and do other similar initializations. Instead it should just return
'const ArchiveModuleCallbacks*', typically a single line.

Allocations etc should happen in one of the callbacks. That way we can
actually have multiple instances of a module.


>  Once
> you incorporate some data within the set of callbacks, isn't this
> stuff close to a "state" data, or just something that we could call
> only an "ArchiveModule"?  Could it make more sense to have
> _PG_archive_module_init return a structure with everything rather than
> a separate in/out argument?  Here is the idea, simply:
> typedef struct ArchiveModule {
>   ArchiveCallbacks *routines;
>   void *private_data;
>   /* Potentially more here, like some flags? */
> } ArchiveModule;

I don't like this much. This still basically ends up with the module callbacks
not being sufficient to instantiate an archive module.

Greetings,

Andres Freund




Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 11:30:08 -0800, Nathan Bossart wrote:
> On Mon, Jan 23, 2023 at 07:28:06PM -0800, Andres Freund wrote:
> > After a tiny bit further polishing, and after separately pushing a resource
> > leak fix for walrcv_connect(), I pushed this.
> 
> My colleague Robins Tharakan (CC'd) noticed crashes when testing recent
> commits, and he traced it back to e460248.  From this information, I
> discovered the following typo:
> 
> diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
> index 8982d623d3..78a8bcee6e 100644
> --- a/contrib/dblink/dblink.c
> +++ b/contrib/dblink/dblink.c
> @@ -321,7 +321,7 @@ dblink_connect(PG_FUNCTION_ARGS)
>  else
>  {
>  if (pconn->conn)
> -libpqsrv_disconnect(conn);
> +libpqsrv_disconnect(pconn->conn);
>  pconn->conn = conn;
>  }

Ugh. Good catch.

Why don't the dblink tests catch this?  Any chance you or Robins could prepare
a patch with fix and test, given that you know how to trigger this?

Greetings,

Andres Freund




Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-30 Thread Nathan Bossart
On Mon, Jan 30, 2023 at 11:49:37AM -0800, Andres Freund wrote:
> Why don't the dblink tests catch this?  Any chance you or Robins could prepare
> a patch with fix and test, given that you know how to trigger this?

It's trivially reproducible by calling 1-argument dblink_connect() multiple
times and then calling dblink_disconnect().  Here's a patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 8982d623d3..78a8bcee6e 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -321,7 +321,7 @@ dblink_connect(PG_FUNCTION_ARGS)
 	else
 	{
 		if (pconn->conn)
-			libpqsrv_disconnect(conn);
+			libpqsrv_disconnect(pconn->conn);
 		pconn->conn = conn;
 	}
 
diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index 14d015e4d5..0f5050b409 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -938,6 +938,25 @@ DROP SERVER fdtest;
 ALTER FOREIGN DATA WRAPPER dblink_fdw OPTIONS (nonexistent 'fdw');
 ERROR:  invalid option "nonexistent"
 HINT:  There are no valid options in this context.
+-- test repeated calls to dblink_connect
+SELECT dblink_connect(connection_parameters());
+ dblink_connect 
+
+ OK
+(1 row)
+
+SELECT dblink_connect(connection_parameters());
+ dblink_connect 
+
+ OK
+(1 row)
+
+SELECT dblink_disconnect();
+ dblink_disconnect 
+---
+ OK
+(1 row)
+
 -- test asynchronous notifications
 SELECT dblink_connect(connection_parameters());
  dblink_connect 
diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql
index e560260bfc..7870ce5d5a 100644
--- a/contrib/dblink/sql/dblink.sql
+++ b/contrib/dblink/sql/dblink.sql
@@ -489,6 +489,11 @@ DROP SERVER fdtest;
 -- should fail
 ALTER FOREIGN DATA WRAPPER dblink_fdw OPTIONS (nonexistent 'fdw');
 
+-- test repeated calls to dblink_connect
+SELECT dblink_connect(connection_parameters());
+SELECT dblink_connect(connection_parameters());
+SELECT dblink_disconnect();
+
 -- test asynchronous notifications
 SELECT dblink_connect(connection_parameters());
 


Re: recovery modules

2023-01-30 Thread Nathan Bossart
On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote:
> I don't think _PG_archive_module_init() should actually allocate a memory
> context and do other similar initializations. Instead it should just return
> 'const ArchiveModuleCallbacks*', typically a single line.
> 
> Allocations etc should happen in one of the callbacks. That way we can
> actually have multiple instances of a module.

I think we'd need to invent a startup callback for archive modules for this
to work, but that's easy enough.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Making background psql nicer to use in tap tests

2023-01-30 Thread Tom Lane
Andres Freund  writes:
> It's annoyingly hard to wait for the result of a query in a generic way with
> background_psql(), and more generally for psql. background_psql() uses -XAtq,
> which means that we'll not get "status" output (like "BEGIN" or "(1 row)"),
> and that queries not returning anything are completely invisible.

Yeah, the empty-query-result problem was giving me fits recently.
+1 for wrapping this into something more convenient to use.

regards, tom lane




Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:
> On Tue, 10 Jan 2023 at 20:14, Andres Freund  wrote:
> > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
> > What precisely do you mean with "skew" here? Do you just mean that it'd 
> > take a
> > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds 
> > like
> > you might mean more than that?
> 
> h->oldest_considered_running can be extremely old due to the global
> nature of the value and the potential existence of a snapshot in
> another database that started in parallel to a very old running
> transaction.

Here's a version that, I think, does not have that issue.

In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
helper function for this, but it seems likely we'll need it in other places
too.  So I named it TransactionIdRetreatSafely(). I made it accept the xid by
pointer, as the line lengths / repetition otherwise end up making it hard to
read the code.  For now I have TransactionIdRetreatSafely() be private to
procarray.c, but I expect we'll have to change that eventually.

Not sure I like TransactionIdRetreatSafely() as a name. Maybe
TransactionIdRetreatClamped() is better?


I've been working on a test for vacuum_defer_cleanup_age. It does catch the
corruption at hand, but not much more.  It's quite painful to write, right
now.  Some of the reasons:
https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de



> > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
> > seems like a mighty invasive change to backpatch.
> 
> I'm not sure either. Protecting against underflow by halving the
> effective valid value space is quite the intervention, but if it is
> necessary to make this work in a performant manner, it would be worth
> it. Maybe someone else with more experience can provide their opinion
> here.

The attached assertions just removes 1/2**32'ths of the space, by reserving
the xid range with the upper 32bit set as something that shouldn't be
reachable.

Still requires us to change the input routines to reject that range, but I
think that's a worthy tradeoff.  I didn't find the existing limits for the
type to be documented anywhere.

Obviously something like that could only go into HEAD.

Greetings,

Andres Freund
>From 8be634900796630a772c0131925f38f136fed599 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 9 Jan 2023 10:23:10 -0800
Subject: [PATCH v3 1/6] WIP: Fix corruption due to vacuum_defer_cleanup_age
 underflowing 64bit xids

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63...@awork3.anarazel.de
Backpatch:
---
 src/backend/storage/ipc/procarray.c | 85 +
 1 file changed, 73 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf96416..64d0896b23b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -367,6 +367,9 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l
 static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
+static void TransactionIdRetreatSafely(TransactionId *xid,
+	   int retreat_by,
+	   FullTransactionId rel);
 
 static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
   TransactionId xid);
@@ -1888,17 +1891,35 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * so guc.c should limit it to no more than the xidStopLimit threshold
 		 * in varsup.c.  Also note that we intentionally don't apply
 		 * vacuum_defer_cleanup_age on standby servers.
+		 *
+		 * Need to use TransactionIdRetreatSafely() instead of open-coding the
+		 * subtraction, to prevent creating an xid before
+		 * FirstNormalTransactionId.
 		 */
-		h->oldest_considered_running =
-			TransactionIdRetreatedBy(h->oldest_considered_running,
-	 vacuum_defer_cleanup_age);
-		h->shared_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->shared_oldest_nonremovable,
-	 vacuum_defer_cleanup_age);
-		h->data_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
-	 vacuum_defer_cleanup_age);
-		/* defer doesn't apply to temp relations */
+		Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+			 h->shared_oldest_nonremovable));
+		Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+			 h->data_oldest_nonremovable));
+
+		if (vacuum_defer_cleanup_age > 0)
+		{
+			TransactionIdRetreatSafely(&h->oldest_considered_running,
+	   vacuum_defer_cleanup_age,
+	   h->latest_completed);
+			TransactionIdRetreatSafely(&h->shared_oldest_nonremovable,
+	   vacuum_defer_cleanup_age,
+	   h->latest_completed);
+			Transact

Re: Add SHELL_EXIT_CODE to psql

2023-01-30 Thread Corey Huinker
>
>
> Unfortunately, there is a fail in FreeBSD
> https://cirrus-ci.com/task/6466749487382528
>
> Maybe, this patch is need to be rebased?
>
>
That failure is in postgres_fdw, which this code doesn't touch.

I'm not able to get
to 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out
- It's not in either of the browseable zip files and the 2nd zip file isn't
downloadable, so it's hard to see what's going on.

I rebased, but there are no code differences.
From e99c7b8aa3b81725773c03583bcff50a88a58a38 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v8 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.1

From 496a84ccf758ed34d2c9bff0e9b458bef5fa9e58 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v8 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.1

From e3dc0f2117b52f9661a75e79097741ed6c0f3b20 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 23 Jan 2023 16:46:16 -0500
Subject: [PATCH v8 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 14 
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 29 +
 src/test/regress/sql/psql.sql  | 25 +
 6 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..ea79d4fe57 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,20 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		Se

Allow an extention to be updated without a script

2023-01-30 Thread Yugo NAGATA
Hi hackers,

I propose to add a new option "updates_without_script" to extension's
control file which a list of updates that do not need update script. 
This enables to update an extension by ALTER EXTENSION even if the
extension module doesn't provide the update script.

Currently, even when we don't need to execute any command to update an
extension from one version to the next, we need to provide an update
script that doesn't contain any command. Preparing such meaningless
files are sometimes annoying.

The attached patch introduces a new option "updates_without_script"
into extension control file. This specifies a list of such updates
following the pattern 'old_version--target_version'. 

For example, 

 updates_without_script = '1.1--1.2, 1.3--1.4'
 
means that updates from version 1.1 to version 1.2 and from version 1.3
to version 1.4 don't need an update script.  In this case, users don't
need to prepare update scripts extension--1.1--1.2.sql and
extension--1.3--1.4.sql if it is not necessary to execute any commands.

The updated path of an extension is determined based on both the filenames
of update scripts and the list of updates specified in updates_without_script.
Presence of update script has higher priority than the option. Therefore,
if an update script is provided, the script will be executed even if this
update is specified in updates_without_script.

What do you think of this feature?
Any feedback would be appreciated.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 938e15b28f66015044b559e4c523fc74590691fc Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Mon, 30 Jan 2023 17:36:55 +0900
Subject: [PATCH] Allow an extention to be updated without a script

When we don't need to execute any command to update an extension from one
version to the next, we can specify a list of such updates following
the pattern 'old_version--target_version' into a new option
updates_without_script.  For example, specifying '1.1--1.2, 1.3--1.4'
means updates from version 1.1 to version 1.2, and from version 1.3
to version 1.4 don't need an update script.  User doesn't need to
provide an update script that doesn't contain any command for such
updates.

The updated path is determined based on both the names of update scripts
and the list in updates_without_script.  If an update script is provided,
the script will be executed even if this update is specified in
updates_without_script.
---
 doc/src/sgml/extend.sgml  |  30 +++-
 src/backend/commands/extension.c  | 161 +++---
 src/test/modules/test_extensions/Makefile |   3 +-
 .../expected/test_extensions.out  |   6 +
 .../test_extensions/sql/test_extensions.sql   |   8 +
 .../test_extensions/test_ext9--1.0.sql|   0
 .../test_extensions/test_ext9--2.0--3.0.sql   |   0
 .../modules/test_extensions/test_ext9.control |   5 +
 8 files changed, 182 insertions(+), 31 deletions(-)
 create mode 100644 src/test/modules/test_extensions/test_ext9--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext9--2.0--3.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext9.control

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 46e873a166..1c4c978264 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -807,6 +807,17 @@ RETURNS anycompatible AS ...

   
  
+
+ 
+  updates_without_script (string)
+  
+   
+A list of updates that do not need update scripts following the pattern
+old_version--target_version,
+for example updates_without_script = '1.1--1.2, 1.3--1.4'.
+   
+  
+ 
 
 
 
@@ -818,8 +829,9 @@ RETURNS anycompatible AS ...
  Secondary control files follow the same format as the primary control
  file.  Any parameters set in a secondary control file override the
  primary control file when installing or updating to that version of
- the extension.  However, the parameters directory and
- default_version cannot be set in a secondary control file.
+ the extension.  However, the parameters directory,
+ default_version, and updates_without_script
+ cannot be set in a secondary control file.
 
 
 
@@ -1092,6 +1104,20 @@ SELECT pg_catalog.pg_extension_config_dump('my_config', 'WHERE NOT standard_entr
  objects, they are automatically dissociated from the extension.
 
 
+
+ If you don't need to execute any command to update an extension from one
+ version to the next, provide an update script that doesn't contain
+ any command or specify a list of such updates following the pattern
+ old_version--target_version
+ into updates_without_script.  For example,
+ updates_without_script = '1.1--1.2, 1.3--1.4'
+ means updates from version 1.1 to version 1.2
+ and from version 1.3 to version 1.4
+ don't need an update script. Note that even if an update is specified in
+ updates_without_script, i

Re: Non-superuser subscription owners

2023-01-30 Thread Mark Dilger



> On Jan 30, 2023, at 11:30 AM, Robert Haas  wrote:
> 
> CREATE SUBSCRIPTION
> provides no tools at all for filtering the data that the subscriber
> chooses to send.
> 
> Now that can be changed, I suppose, and a run-as user would be one way
> to make progress in that direction. But I'm not sure how viable that
> is, because...
> 
>>> In what
>>> circumstances would be it be reasonable to give responsibility for
>>> those objects to different and especially mutually untrusting users?
>> 
>> When public repositories of data, such as the IANA whois database, publish 
>> their data via postgres publications.
> 
> ... for that to work, IANA would need to set up the database so that
> untrusted parties can create logical replication slots on their
> PostgreSQL server. And I think that granting REPLICATION privilege on
> your database to random people on the Internet is not really viable,
> nor intended to be viable.

That was an aspirational example in which there's infinite daylight between the 
publisher and subscriber.  I, too, doubt that's ever going to be possible.  But 
I still think we should aspire to some extra daylight between the two.  Perhaps 
IANA doesn't publish to the whole world, but instead publishes only to 
subscribers who have a contract in place, and have agreed to monetary penalties 
should they abuse the publishing server.  Whatever.  There's going to be some 
amount of daylight possible if we design for it, and none otherwise.

My real argument here isn't against your goal of having non-superusers who can 
create subscriptions.  That part seems fine to me.

Given that my work last year made it possible for subscriptions to run as 
somebody other than the subscription creator, it annoys me that you now want 
the subscription creator's privileges to be what the subscription runs as.  
That seems to undo what I worked on.  In my mental model of a 
(superuser-creator, non-superuser-owner) pair, it seems you're logically only 
touching the lefthand side, so you should then have a (nonsuperuser-creator, 
nonsuperuser-owner) pair.  But you don't.  You go the apparently needless extra 
step of just squashing them together.  I just don't see why it needs to be like 
that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Non-superuser subscription owners

2023-01-30 Thread Robert Haas
On Fri, Jan 27, 2023 at 5:00 PM Andres Freund  wrote:
> > Or, another thought, maybe this should be checking for CREATE on the
> > current database + also pg_create_subscription. That seems like it
> > might be the right idea, actually.
>
> Yes, that seems like a good idea.

Done in this version. I also changed check_conninfo to take an extra
argument instead of returning a Boolean, as per your suggestion.

I had a long think about what to do with ALTER SUBSCRIPTION ... OWNER
TO in terms of permissions checks. The previous version required that
the new owner have permissions of pg_create_subscription, but there
seems to be no particular reason for that rule except that it happens
to be what I made the code do. So I changed it to say that the current
owner must have CREATE privilege on the database, and must be able to
SET ROLE to the new owner. This matches the rule for CREATE SCHEMA.
Possibly we should *additionally* require that the person performing
the rename still have pg_create_subscription, but that shouldn't be
the only requirement. This change means that you can't just randomly
give your subscription to the superuser (with or without concurrently
attempting some other change as per your other comments) which is good
because you can't do that with other object types either.

There seems to be a good deal of inconsistency here. If you want to
give someone a schema, YOU need CREATE on the database. But if you
want to give someone a table, THEY need CREATE on the containing
schema. It make sense that we check permissions on the containing
object, which could be a database or a schema depending on what you're
renaming, but it's unclear to me why we sometimes check on the person
performing the ALTER command and at other times on the recipient. It's
also somewhat unclear to me why we are checking CREATE in the first
place, especially on the donor. It might make sense to have a rule
that you can't own an object in a place where you couldn't have
created it, but there is no such rule, because you can give someone
CREATE on a schema, they can create an object, and they you can take
CREATE a way and they still own an object there. So it kind of looks
to me like we made it up as we went along and that the result isn't
very consistent, but I'm inclined to follow CREATE SCHEMA here unless
there's some reason to do otherwise.

Another question around ALTER SUBSCRIPTION ... OWNER TO and also ALTER
SUBSCRIPTION .. RENAME is whether they ought to fail if you're not a
superuser and password_required false is set. They are separate code
paths from the rest of the ALTER SUBSCRIPTION cases, so if we want
that to be a rule we need dedicated code for it. I'm not quite sure
what's right. There's no comparable case for ALTER USER MAPPING
because a user mapping doesn't have an owner and so can't be
reassigned to a new owner. I don't see what the harm is, especially
for RENAME, but I might be missing something, and it certainly seems
arguable.

> > I'm not entirely clear whether there's a hazard there.
>
> I'm not at all either. It's just a code pattern that makes me anxious - I
> suspect there's a few places it makes us more vulnerable.

It looks likely to me that it was cut down from the CREATE SCHEMA code, FWIW.

> > If there is, I think we could fix it by moving the LockSharedObject call up
> > higher, above object_ownercheck. The only problem with that is it lets you
> > lock an object on which you have no permissions: see
> > 2ad36c4e44c8b513f6155656e1b7a8d26715bb94. To really fix that, we'd need an
> > analogue of RangeVarGetRelidExtended.
>
> Yea, we really should have something like RangeVarGetRelidExtended() for other
> kinds of objects. It'd take a fair bit of work / time to use it widely, but
> it'll take even longer if we start in 5 years ;)

We actually have something sort of like that in the form of
get_object_address(). It doesn't allow for a callback, but it does
have a retry loop.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Add-new-predefined-role-pg_create_subscriptions.patch
Description: Binary data


Re: run pgindent on a regular basis / scripted manner

2023-01-30 Thread Bruce Momjian
On Sat, Jan 28, 2023 at 05:06:03PM -0800, Noah Misch wrote:
> On Tue, Jan 24, 2023 at 02:04:02PM -0500, Bruce Momjian wrote:
> > On Tue, Jan 24, 2023 at 09:54:57AM -0500, Tom Lane wrote:
> > > As another example, the mechanisms we use to create the typedefs list
> > > in the first place are pretty squishy/leaky: they depend on which
> > > buildfarm animals are running the typedef-generation step, and on
> > > whether anything's broken lately in that code --- which happens on
> > > a fairly regular basis (eg [1]).  Maybe that could be improved,
> > > but I don't see an easy way to capture the set of system-defined
> > > typedefs that are in use on platforms other than your own.  I
> > > definitely do not want to go over to hand maintenance of that list.
> > 
> > As I now understand it, we would need to standardize on a typedef list
> > at the beginning of each major development cycle, and then only allow
> > additions,
> 
> Not to my knowledge.  There's no particular obstacle to updating the list more
> frequently or removing entries.

We would need to re-pgindent the tree each time, I think, which would
cause disruption if we did it too frequently.

> > and the addition would have to include any pgindent affects
> > of the addition.
> 
> Yes, a hook intended to enforce pgindent cleanliness should run tree-wide
> pgindent when the given commit(s) change the typedef list.  typedef list
> changes essentially become another kind of refactoring that can yield merge
> conflicts.  If your commit passed the pgindent check, rebasing it onto a new
> typedefs list may require further indentation changes.  New typedefs don't
> tend to change a lot of old code, so I would expect this sort of conflict to
> be minor, compared to all the other sources of conflicts.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: heapgettup() with NoMovementScanDirection unused in core?

2023-01-30 Thread Melanie Plageman
v2 attached

On Fri, Jan 27, 2023 at 6:28 PM David Rowley  wrote:
>
> On Sat, 28 Jan 2023 at 12:15, Tom Lane  wrote:
> > /*
> >  * Determine the net effect of two direction specifications.
> >  * This relies on having ForwardScanDirection = +1, BackwardScanDirection = 
> > -1,
> >  * and will probably not do what you want if applied to any other values.
> >  */
> > #define CombineScanDirections(a, b)  ((a) * (b))
> >
> > The main thing this'd buy us is being able to grep for uses of the
> > trick.  If it's written as just multiplication, good luck being
> > able to find what's depending on that, should you ever need to.
>
> Yeah, I think the multiplication macro is a good way of doing it.
> Having the definition of it close to the ScanDirection enum's
> definition is likely a very good idea so that anyone adjusting the
> enum values is more likely to notice that it'll cause an issue. A
> small note on the enum declaration about the -1, +1 values being
> exploited in various places might be a good idea too. I see v6-0006 in
> [1] further exploits this, so that's further reason to document that.
>
> My personal preference would have been to call it
> ScanDirectionCombine, so the naming is more aligned to the 4 other
> macro names that start with ScanDirection in sdir.h, but I'm not going
> to fuss over it.

I've gone with this macro name.
I've also updated comments Tom mentioned and removed the test.

As for the asserts, I was at a bit of a loss as to where to put an
assert which will make it clear that heapgettup() and
heapgettup_pagemode() do not handle NoMovementScanDirection but was
at a higher level of the executor. Do we not have to accommodate the
direction changing from tuple to tuple? If we don't expect the plan node
direction to change during execution, then why recalculate
estate->es_direction for each invocation of Index/SeqNext()?

As such, in this version I've put the asserts in heapgettup() and
heapgettup_pagemode().

I also realized that it doesn't really make sense to assert about the
index scan direction in ExecInitIndexOnlyScan() and ExecInitIndexScan()
-- so I've moved the assertion to planner when we make the index plan
from the path. I'm not sure if it is needed.

- Melanie
From d6aae3a73864aaaf84b4dc00ff299c2e8b4a5729 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 30 Jan 2023 09:49:54 -0500
Subject: [PATCH v2] remove nomovement scandir

---
 src/backend/access/heap/heapam.c | 71 ++--
 src/backend/commands/explain.c   |  3 -
 src/backend/executor/nodeIndexonlyscan.c | 11 +---
 src/backend/executor/nodeIndexscan.c | 11 +---
 src/backend/optimizer/path/indxpath.c|  8 +--
 src/backend/optimizer/plan/createplan.c  | 14 +
 src/backend/optimizer/util/pathnode.c|  5 +-
 src/include/access/sdir.h| 11 +++-
 src/include/nodes/pathnodes.h|  6 +-
 9 files changed, 41 insertions(+), 99 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e6024a980b..10aaeb14aa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -490,9 +490,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
  *		tuple as indicated by "dir"; return the next tuple in scan->rs_ctup,
  *		or set scan->rs_ctup.t_data = NULL if no more tuples.
  *
- * dir == NoMovementScanDirection means "re-fetch the tuple indicated
- * by scan->rs_ctup".
- *
  * Note: the reason nkeys/key are passed separately, even though they are
  * kept in the scan descriptor, is that the caller may not want us to check
  * the scankeys.
@@ -523,6 +520,8 @@ heapgettup(HeapScanDesc scan,
 	int			linesleft;
 	ItemId		lpp;
 
+	Assert(dir != NoMovementScanDirection);
+
 	/*
 	 * calculate next starting lineoff, given scan direction
 	 */
@@ -583,7 +582,7 @@ heapgettup(HeapScanDesc scan,
 
 		linesleft = lines - lineoff + 1;
 	}
-	else if (backward)
+	else
 	{
 		/* backward parallel scan not supported */
 		Assert(scan->rs_base.rs_parallel == NULL);
@@ -653,34 +652,6 @@ heapgettup(HeapScanDesc scan,
 
 		linesleft = lineoff;
 	}
-	else
-	{
-		/*
-		 * ``no movement'' scan direction: refetch prior tuple
-		 */
-		if (!scan->rs_inited)
-		{
-			Assert(!BufferIsValid(scan->rs_cbuf));
-			tuple->t_data = NULL;
-			return;
-		}
-
-		block = ItemPointerGetBlockNumber(&(tuple->t_self));
-		if (block != scan->rs_cblock)
-			heapgetpage((TableScanDesc) scan, block);
-
-		/* Since the tuple was previously fetched, needn't lock page here */
-		page = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page);
-		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
-		lpp = PageGetItemId(page, lineoff);
-		Assert(ItemIdIsNormal(lpp));
-
-		tuple->t_data = (HeapTupleHeader) PageGetItem(page, lpp);
-		tuple->t_len = ItemIdGetLength(lpp);
-
-		return;
-	}
 
 	/*
 	 * advance the scan until we find a qualifying tuple or run out of stuff
@@ -861,6 +832,8 @@ heapgettup_pagemod

Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-30 Thread Jacob Champion
On Fri, Jan 27, 2023 at 12:13 PM Cary Huang  wrote:
> > (Eventually I'd like to teach the server not to ask for a client
> >  certificate if it's not going to use it.)
>
> If clientcert is not requested by the server, but yet the client still
> sends the certificate, the server will still verify it. This is the case
> in this discussion.

I think this is maybe conflating the application-level behavior with
the protocol-level behavior. A client certificate is requested by the
server if ssl_ca_file is set, whether clientcert is set in the HBA or
not. It's this disconnect between the intuitive behavior and the
actual behavior that I'd like to eventually improve.

--Jacob




Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-01-30 Thread Tom Lane
Aleksander Alekseev  writes:
>> What we do actually have already is a recommendation to install
>> appropriate MacPorts or Homebrew packages:
>> https://www.postgresql.org/docs/devel/docguide-toolsets.html#DOCGUIDE-TOOLSETS-INST-MACOS
>> and it works okay for me as long as I use MacPorts' version of xsltproc.

> Unfortunately it doesn't work for Homebrew anymore and there seems to
> be only one xsltproc in the system.

Hmm.  Seems unlikely that Homebrew would have dropped the package(s)
altogether.  But ... poking at this, I discovered that there are
inaccuracies in our docs for MacPorts:

* /opt/local/bin/xsltproc is provided by libxslt, and
/opt/local/bin/xmllint is provided by libxml2, neither of which
will be installed by our recipe as given.  You might have pulled
those ports in already to build Postgres with, but if you didn't, the
recipe will fail.  I wonder if the Homebrew recipe has the same bug.

* At some point MacPorts renamed docbook-xsl to docbook-xsl-nons.
This is harmless at the moment, because if you ask for docbook-xsl
it will automatically install docbook-xsl-nons instead.  I wonder
if that'll be true indefinitely, though.

I also wonder whether we shouldn't point at the meta-package docbook-xml
instead of naming a particular version here (and having to update
that from time to time).  The extra disk space to install all the DTD
versions is entirely insignificant (< 2MB).

> Does your:
> xsltproc --help
> ... also say that it uses /etc/xml/catalog path by default?

Both /usr/bin/xsltproc and /opt/local/bin/xsltproc say

--catalogs : use SGML catalogs from $SGML_CATALOG_FILES
 otherwise XML Catalogs starting from 
 file:///etc/xml/catalog are activated by default

However, this appears to be a lie for /opt/local/bin/xsltproc;
what it's apparently *actually* using is /opt/local/etc/xml/catalog,
which is what MacPorts provides.

I repeated the test I did this morning, and this time using --catalogs
with SGML_CATALOG_FILES set to /opt/local/etc/xml/catalog worked for me,
using either copy of xsltproc.  I must've fat-fingered it somehow before.
Nonetheless, I doubt that that recipe is worth recommending to MacPorts
users: if they pull in the DTD packages they might as well pull in libxml2
and libxslt, and then they don't need to adjust anything.

In short, I think we need to update J.2.4 to say this for MacPorts:

sudo port install libxml2 libxslt docbook-xml docbook-xsl-nons fop

and I strongly suspect that the Homebrew recipe has a similar oversight.

regards, tom lane




Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-30 Thread Jacob Champion
On Sun, Jan 29, 2023 at 5:02 AM Jim Jones  wrote:
> On 27.01.23 21:13, Cary Huang wrote:
>  > But, if the server does request clientcert but client uses
> "sslcertmode=disable" to connect and not give a certificate, it would
> also result in authentication failure. In this case, we actually would
> want to ignore "sslcertmode=disable" and send default certificates if
> found.
>
> I'm just wondering if this is really necessary. If the server asks for a
> certificate and the user explicitly says "I don't want to send it",
> shouldn't it be ok for the server return an authentication failure? I
> mean, wouldn't it defeat the purpose of "sslcertmode=disable"?

+1. In my opinion, if I tell libpq not to share my certificate with
the server, and it then fails to authenticate, that's intended and
useful behavior. (I don't really want libpq to try to find more ways
to authenticate me; that causes other security issues [1, 2].)

--Jacob

[1] 
https://www.postgresql.org/message-id/0adf992619e7bf138eb4119622d37e3efb6515d5.camel%40j-davis.com
[2] https://www.postgresql.org/message-id/46562.1637695110%40sss.pgh.pa.us




Re: Allow an extention to be updated without a script

2023-01-30 Thread Tom Lane
Yugo NAGATA  writes:
> Currently, even when we don't need to execute any command to update an
> extension from one version to the next, we need to provide an update
> script that doesn't contain any command. Preparing such meaningless
> files are sometimes annoying.

If you have no update script, why call it a new version?  The point
of extension versions is to distinguish different states of the
extension's SQL objects.  We do not consider mods in underlying C code
to justify a new version.

> The attached patch introduces a new option "updates_without_script"
> into extension control file. This specifies a list of such updates
> following the pattern 'old_version--target_version'.

This seems completely unnecessary and confusing.

regards, tom lane




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-01-30 Thread Jacob Champion
On Fri, Jan 27, 2023 at 1:08 PM Robert Haas  wrote:
> > 1) Forwarding the original ambient context along with the request, so
> > the server can check it too.
>
> Right, so a protocol extension. Reasonable idea, but a big lift. Not
> only do you need everyone to be running a new enough version of
> PostgreSQL, but existing proxies like pgpool and pgbouncer need
> updates, too.

Right.

> > 2) Explicitly combining the request with the proof of authority needed
> > to make it, as in capability-based security [3].
>
> As far as I can see, that link doesn't address how you'd make this
> approach work across a network.

The CSRF-token example I gave is one. But that's HTTP-specific
(stateless, server-driven) and probably doesn't make a lot of sense
for our case.

For our case, assuming that connections have side effects, one
solution could be for the client to signal to the server that the
connection should use in-band authentication only; i.e. fail the
connection if the credentials provided aren't good enough by
themselves to authenticate the client. (This has some overlap with
SASL negotiation, maybe.)

But that still requires server support. I don't know if there's a
clever way to tie the authentication to the request on the client side
only, using existing server implementations. (If connections don't
have side effects, require_auth should be sufficient.)

> > 3) Dropping as many implicitly-held privileges as possible before making
> > a request. This doesn't solve the problem but may considerably reduce
> > the practical attack surface.
>
> Right. I definitely don't object to this kind of approach, but I don't
> think it can ever be sufficient by itself.

I agree. (But for the record, I think that an outbound proxy filter is
also insufficient. Someone, somewhere, is going to want to safely
proxy through localhost _and_ have peer authentication set up.)

> > I think this style focuses on absolute configuration flexibility at the
> > expense of usability. It obfuscates the common use cases. (I have the
> > exact same complaint about our HBA and ident configs, so I may be
> > fighting uphill.)
>
> That's probably somewhat true, but on the other hand, it also is more
> powerful than what you're describing. In your system, is there some
> way the DBA can say "hey, you can connect to any of the machines on
> this list of subnets, but nothing else"? Or equally, "hey, you may NOT
> connect to any machine on this list of subnets, but anything else is
> fine"? Or "you can connect to these subnets without SSL, but if you
> want to talk to anything else, you need to use SSL"?

I guess I didn't call it out explicitly, so it was fair to assume that
it did not. I don't think we should ignore those cases.

But if we let the configuration focus on policies instead, and
simultaneously improve the confused-deputy problem, then any IP/host
filter functionality that we provide becomes an additional safety
measure instead of your only viable line of defense. "I screwed up our
IP filter, but we're still safe because the proxy refused to forward
its client cert to the backend." Or, "this other local application
requires peer authentication, but it's okay because the proxy
disallows those connections by default."

> Your idea
> seems to rely on us being able to identify all of the policies that a
> user is likely to want and give names to each one, and I don't feel
> very confident that that's realistic. But maybe I'm misinterpreting
> your idea?

No, that's pretty accurate. But I'm used to systems that provide a
ridiculous number of policies [1, 2] via what's basically a scoped
property bag. "Turn off option 1 and 2 globally. For host A and IP
address B, turn on option 1 as an exception." And I don't really
expect us to need as many options as those systems do.

I think that configuration style evolves well, it focuses on the right
things, and it can still handle IP lists intuitively [3], if that's
the way a DBA really wants to set up policies.

--Jacob

[1] https://httpd.apache.org/docs/2.4/mod/mod_proxy.html
[2] https://www.haproxy.com/documentation/hapee/latest/onepage/#4
[3] 
https://docs.nginx.com/nginx/admin-guide/security-controls/controlling-access-proxied-tcp/




Re: Non-superuser subscription owners

2023-01-30 Thread Robert Haas
On Mon, Jan 30, 2023 at 3:27 PM Mark Dilger
 wrote:
> That was an aspirational example in which there's infinite daylight between 
> the publisher and subscriber.  I, too, doubt that's ever going to be 
> possible.  But I still think we should aspire to some extra daylight between 
> the two.  Perhaps IANA doesn't publish to the whole world, but instead 
> publishes only to subscribers who have a contract in place, and have agreed 
> to monetary penalties should they abuse the publishing server.  Whatever.  
> There's going to be some amount of daylight possible if we design for it, and 
> none otherwise.
>
> My real argument here isn't against your goal of having non-superusers who 
> can create subscriptions.  That part seems fine to me.
>
> Given that my work last year made it possible for subscriptions to run as 
> somebody other than the subscription creator, it annoys me that you now want 
> the subscription creator's privileges to be what the subscription runs as.  
> That seems to undo what I worked on.  In my mental model of a 
> (superuser-creator, non-superuser-owner) pair, it seems you're logically only 
> touching the lefthand side, so you should then have a (nonsuperuser-creator, 
> nonsuperuser-owner) pair.  But you don't.  You go the apparently needless 
> extra step of just squashing them together.  I just don't see why it needs to 
> be like that.

I feel like you're accusing me of removing functionality that has
never existed. A subscription doesn't run as the subscription creator.
It runs as the subscription owner. If you or anyone else had added the
capability for it to run as someone other than the subscription owner,
I certainly wouldn't be trying to back that capability out as part of
this patch, and because there isn't, I'm not proposing to add that as
part of this patch. I don't see how that makes me guilty of squashing
anything together. The current state of affairs, where the run-as user
is taken from pg_subscription.subowner, the same field that is updated
by ALTER SUBSCRIPTION ... OWNER TO, is the result of your work, not
anything that I have done or am proposing to do.

I also *emphatically* disagree with the idea that this undoes what you
worked on. My patch would be *impossible* without your work. Prior to
your work, the run-as user was always, basically, the superuser, and
so the idea of allowing anyone other than a superuser to execute
CREATE SUBSCRIPTION would be flat-out nuts. Because of your work,
that's now a thing that we may be able to reasonably allow, if we can
work through the remaining issues. So I'm grateful to you, and also
sorry to hear that you're annoyed with me. But I still don't think
that the fact that the division you want doesn't exist is somehow my
fault.

I'm kind of curious why you *didn't* make this distinction at the time
that you were did the other work in this area. Maybe my memory is
playing tricks on me again, but I seem to recall talking about the
idea with you at the time, and I seem to recall thinking that it
sounded like an OK idea. I seem to vaguely recall us discussing
hazards like: well, what if replication causes code to get executed as
the subscription owner that that causes something bad to happen? But I
think the only way that happens is if they put triggers on the tables
that are being replicated, which is their choice, and they can avoid
installing problematic code there if they want. I think there might
have been some other scenarios, too, but I just can't remember. In any
case, I don't think the idea is completely without merit. I think it
could very well be something that we want to have for one reason or
another. But I don't currently understand exactly what those reasons
are, and I don't see any reason why one patch should both split owner
from run-as user and also allow the owner to be a non-superuser. That
seems like two different efforts to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-01-30 Thread Robert Haas
On Mon, Jan 30, 2023 at 4:12 PM Jacob Champion  wrote:
> For our case, assuming that connections have side effects, one
> solution could be for the client to signal to the server that the
> connection should use in-band authentication only; i.e. fail the
> connection if the credentials provided aren't good enough by
> themselves to authenticate the client. (This has some overlap with
> SASL negotiation, maybe.)

I'm not an expert on this stuff, but to me that feels like a weak and
fuzzy concept. If the client is going to tell the server something,
I'd much rather have it say something like "i'm proxying a request
from my local user rhaas, who authenticated using such and such a
method and connected from such and such an IP yadda yadda". That feels
to me like really clear communication that the server can then be
configured to something about via pg_hba.conf or similar. Saying "use
in-band authentication only", to me, feels much murkier. As the
recipient of that message, I don't know exactly what to do about it,
and it feels like whatever heuristic I adopt might end up being wrong
and something bad happens anyway.

> I agree. (But for the record, I think that an outbound proxy filter is
> also insufficient. Someone, somewhere, is going to want to safely
> proxy through localhost _and_ have peer authentication set up.)

Well then they're indeed going to need some way to distinguish a
proxied connection from a non-proxied one. You can't send identical
connection requests in different scenarios and get different
results

> I guess I didn't call it out explicitly, so it was fair to assume that
> it did not. I don't think we should ignore those cases.

OK, cool.

> But if we let the configuration focus on policies instead, and
> simultaneously improve the confused-deputy problem, then any IP/host
> filter functionality that we provide becomes an additional safety
> measure instead of your only viable line of defense. "I screwed up our
> IP filter, but we're still safe because the proxy refused to forward
> its client cert to the backend." Or, "this other local application
> requires peer authentication, but it's okay because the proxy
> disallows those connections by default."

Defense in depth is good.

> > Your idea
> > seems to rely on us being able to identify all of the policies that a
> > user is likely to want and give names to each one, and I don't feel
> > very confident that that's realistic. But maybe I'm misinterpreting
> > your idea?
>
> No, that's pretty accurate. But I'm used to systems that provide a
> ridiculous number of policies [1, 2] via what's basically a scoped
> property bag. "Turn off option 1 and 2 globally. For host A and IP
> address B, turn on option 1 as an exception." And I don't really
> expect us to need as many options as those systems do.
>
> I think that configuration style evolves well, it focuses on the right
> things, and it can still handle IP lists intuitively [3], if that's
> the way a DBA really wants to set up policies.

I think what we really need here is an example or three of a proposed
configuration file syntax. I think it would be good if we could pick a
syntax that doesn't require a super-complicated parser, and that maybe
has something in common with our existing configuration file syntaxes.
But if we have to invent something new, then we can do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Transparent column encryption

2023-01-30 Thread Jacob Champion
On Wed, Jan 25, 2023 at 11:00 AM Peter Eisentraut
 wrote:
> > When writing the paragraph on RSA-OAEP I was reminded that we didn't
> > really dig into the asymmetric/symmetric discussion. Assuming that most
> > first-time users will pick the builtin CMK encryption method, do we
> > still want to have an asymmetric scheme implemented first instead of a
> > symmetric keywrap? I'm still concerned about that public key, since it
> > can't really be made public.
>
> I had started coding that, but one problem was that the openssl CLI
> doesn't really provide any means to work with those kinds of keys.  The
> "openssl enc" command always wants to mix in a password.  Without that,
> there is no way to write a test case, and more crucially no way for
> users to set up these kinds of keys.  Unless we write our own tooling
> for this, which, you know, the patch just passed 400k in size.

Arrgh: https://github.com/openssl/openssl/issues/10605

> > The column encryption algorithm is set per-column -- but isn't it
> > tightly coupled to the CEK, since the key length has to match? From a
> > layperson perspective, using the same key to encrypt the same plaintext
> > under two different algorithms (if they happen to have the same key
> > length) seems like it might be cryptographically risky. Is there a
> > reason I should be encouraged to do that?
>
> Not really.  I was also initially confused by this setup, but that's how
> other similar systems are set up, so I thought it would be confusing to
> do it differently.

Which systems let you mix and match keys and algorithms this way? I'd
like to take a look at them.

> > With the loss of \gencr it looks like we also lost a potential way to
> > force encryption from within psql. Any plans to add that for v1?
>
> \gencr didn't do that either.  We could do it.  The libpq API supports
> it.  We just need to come up with some syntax for psql.

Do you think people would rather set encryption for all parameters at
once -- something like \encbind -- or have the ability to mix
encrypted and unencrypted parameters?

> > Are there plans to document client-side implementation requirements, to
> > ensure cross-client compatibility? Things like the "PG\x00\x01"
> > associated data are buried at the moment (or else I've missed them in
> > the docs). If you're holding off until the feature is more finalized,
> > that's fine too.
>
> This is documented in the protocol chapter, which I thought was the
> right place.  Did you want more documentation, or in a different place?

I just missed it; sorry.

> > Speaking of cross-client compatibility, I'm still disconcerted by the
> > ability to write the value "hello world" into an encrypted integer
> > column. Should clients be required to validate the text format, using
> > the attrealtypid?
>
> Well, we can ask them to, but we can't really require them, in a
> cryptographic sense.  I'm not sure what more we can do.

Right -- I just mean that clients need to pay more attention to it
now, whereas before they may have delegated correctness to the server.
The problem is documented in the context of deterministic encryption,
but I think it applies to randomized as well.

More concretely: should psql allow you to push arbitrary text into an
encrypted \bind parameter, like it does now?

> > It occurred to me when looking at the "unspecified" CMK scheme that the
> > CEK doesn't really have to be an encryption key at all. In that case it
> > can function more like a (possibly signed?) cookie for lookup, or even
> > be ignored altogether if you don't want to use a wrapping scheme
> > (similar to JWE's "direct" mode, maybe?). So now you have three ways to
> > look up or determine a column encryption key (CMK realm, CMK name, CEK
> > cookie)... is that a concept worth exploring in v1 and/or the documentation?
>
> I don't completely follow this.

Yeah, I'm not expressing it very well. My feeling is that the
organization system here -- a realm "contains" multiple CMKs, a CMK
encrypts multiple CEKs -- is so general and flexible that it may need
some suggested guardrails for people to use it sanely. I just don't
know what those guardrails should be. I was motivated by the
realization that CEKs don't even need to be keys.

Thanks,
--Jacob




Re: pub/sub - specifying optional parameters without values.

2023-01-30 Thread Peter Smith
On Tue, Jan 31, 2023 at 4:00 AM Tom Lane  wrote:
>
> Peter Smith  writes:
> > The v3 patch LGTM (just for the logical replication commands).
>
> Pushed then.
>

Thanks for pushing the v3 patch.

I'd forgotten about the 'streaming' option -- AFAIK this was
previously a boolean parameter and so its [= value] part can also be
omitted. However, in PG16 streaming became an enum type
(on/off/parallel), and the value can still be omitted but that is not
really being covered by the new generic text note about booleans added
by yesterday's patch.

e.g. The enum 'streaming' value part can still be omitted.
test_sub=# create subscription sub1 connection 'host=localhost
dbname=test_pub' publication pub1 with (streaming);

Perhaps a small top-up patch to CREATE SUBSCRIPTION is needed to
describe this special case?

PSA.

(I thought mentioning this special streaming case again for ALTER
SUBSCRIPTION might be overkill)

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Added-note-for-streaming-parameter-with-value-par.patch
Description: Binary data


Re: pub/sub - specifying optional parameters without values.

2023-01-30 Thread Tom Lane
Peter Smith  writes:
> I'd forgotten about the 'streaming' option -- AFAIK this was
> previously a boolean parameter and so its [= value] part can also be
> omitted. However, in PG16 streaming became an enum type
> (on/off/parallel), and the value can still be omitted but that is not
> really being covered by the new generic text note about booleans added
> by yesterday's patch.

Hmph.  I generally think that options defined like this (it's a boolean,
except it isn't) are a bad idea, and would prefer to see that API
rethought while we still can.

regards, tom lane




Re: recovery modules

2023-01-30 Thread Michael Paquier
On Mon, Jan 30, 2023 at 12:04:22PM -0800, Nathan Bossart wrote:
> On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote:
>> I don't think _PG_archive_module_init() should actually allocate a memory
>> context and do other similar initializations. Instead it should just return
>> 'const ArchiveModuleCallbacks*', typically a single line.
>> 
>> Allocations etc should happen in one of the callbacks. That way we can
>> actually have multiple instances of a module.
> 
> I think we'd need to invent a startup callback for archive modules for this
> to work, but that's easy enough.

If you don't return some (void *) pointing to a private area that
would be stored by the backend, allocated as part of the loading path,
I agree that an extra callback is what makes the most sense,
presumably called around the beginning of PgArchiverMain().  Doing
this kind of one-time action in the file callback woud be weird..
--
Michael


signature.asc
Description: PGP signature


Re: heapgettup refactoring

2023-01-30 Thread Melanie Plageman
v7 attached

On Fri, Jan 27, 2023 at 10:34 PM David Rowley  wrote:
>
> "On Wed, 25 Jan 2023 at 10:17, Melanie Plageman
>  wrote:
> > I've added a comment but I didn't include the function name in it -- I
> > find it repetitive when the comments above functions do that -- however,
> > I'm not strongly attached to that.
>
> I think the general format for header comments is:
>
> /*
>  * 
>  *\t\t
>  *
>  * [Further details]
>  */
>
> We've certainly got places that don't follow that, but I don't think
> that's any reason to have no comment or invent some new format.
>
> heapam.c seems to have some other format where we do: "
> - ". I generally just try to copy
> the style from the surrounding code. I think generally, people won't
> argue if you follow the style from the surrounding code, but there'd
> be exceptions to that, I'm sure.

I have followed the same convention as the other functions in heapam.c
in the various helper functions comments I've added in this version.

> v6-0002:
>
> 1. heapgettup_initial_block() needs a header comment to mention what
> it does and what it returns. It would be good to make it obvious that
> it returns InvalidBlockNumber when there are no blocks to scan.

I've done this.

>
> 2. After heapgettup_initial_block(), you're checking "if (block ==
> InvalidBlockNumber). It might be worth a mention something like
>
> /*
>  * Check if we got to the end of the scan already.  This could happen for
>  * an empty relation or if parallel workers scanned everything before we
>  * got a chance.
>  */
>
> the backward scan comment wouldn't mention parallel workers.

I've done this as well.

>
> v6-0003:
>
> 3. Can you explain why you removed the snapshot local variable in 
> heapgettup()?

In the subsequent commit, the helpers I add call TestForOldSnapshot(),
and I didn't want to pass in the snapshot as a separate parameter since
I already need to pass the scan descriptor. I thought it was confusing
to have a local variable (snapshot) used in some places and the one in
the scan used in others. This "streamlining" commit also reduces the
number of times the snapshot variable is used, making it less necessary
to have a local variable.

I didn't remove the snapshot local variable in the same commit as adding
the helpers because I thought it made the diff harder to understand (for
review, the final commit should likely not be separate patches).

> 4. I think it might be a good idea to use unlikely() in if
> (!scan->rs_inited). The idea is to help coax the compiler into moving
> that code off to a cold path. That's likely especially important if
> heapgettup_initial_block is inlined, which I see it is marked as.

I've gone ahead and added unlikely. However, should I perhaps skip
inlining the heapgettup_initial_block() function?

> v6-0004:
>
> 5. heapgettup_start_page() and heapgettup_continue_page() both need a
> header comment to explain what they do and what the inputs and output
> arguments are.

I've added these. I've also removed an unused parameter to both, block.

>
> 6. I'm not too sure what the following comment means:
>
> /* block and lineoff now reference the physically next tid */
>
> "block" is just a parameter to the function and its value is not
> adjusted. The comment makes it sound like something was changed.
>
> (I think these might be just not well updated from having split this
> out from the 0006 patch as the same comment makes more sense in 0006)

Yes, that is true. I've updated it to just mention lineoff.

> v6-0005:
>
> 7. heapgettup_advance_block() needs a header comment.
>
> 8. Is there a reason why heapgettup_advance_block() handle backward
> scans first? I'd expect you should just follow the lead of the other
> functions and do ScanDirectionIsForward first.

The reason I do this is that backwards scans cannot be parallel, so
handling backwards scans first let me return, then handle parallel
scans, then forward scans. This reduced the level of nesting/if
statements for all of the code in this function.

- Melanie
From 62416fee879a1ea6a7ca7ec132227701d35d7468 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 9 Jan 2023 17:33:43 -0500
Subject: [PATCH v7 2/6] Add heapgettup_initial_block() helper

---
 src/backend/access/heap/heapam.c | 226 ++-
 1 file changed, 102 insertions(+), 124 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 75f68b4e79..ff2e64822a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -484,6 +484,67 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 }
 
 
+/*
+ * heapgettup_initial_block - helper for heapgettup() and heapgettup_pagemode()
+ *
+ * Determines and returns the first block to scan, given the scan direction and
+ * whether or not it is parallel. If the relation is empty or the current
+ * parallel worker finds the scan has already been completed by other workers,
+ * return InvalidBlockNumber.
+ *
+ * Note th

monitoring usage count distribution

2023-01-30 Thread Nathan Bossart
My colleague Jeremy Schneider (CC'd) was recently looking into usage count
distributions for various workloads, and he mentioned that it would be nice
to have an easy way to do $SUBJECT.  I've attached a patch that adds a
pg_buffercache_usage_counts() function.  This function returns a row per
possible usage count with some basic information about the corresponding
buffers.

postgres=# SELECT * FROM pg_buffercache_usage_counts();
 usage_count | buffers | dirty | pinned
-+-+---+
   0 |   0 | 0 |  0
   1 |1436 |   671 |  0
   2 | 102 |88 |  0
   3 |  23 |21 |  0
   4 |   9 | 7 |  0
   5 | 164 |   106 |  0
(6 rows)

This new function provides essentially the same information as
pg_buffercache_summary(), but pg_buffercache_summary() only shows the
average usage count for the buffers in use.  If there is interest in this
idea, another approach to consider could be to alter
pg_buffercache_summary() instead.

Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b2fd87696185537d2cbb611cf70e743d7e197406 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 Jan 2023 16:39:43 -0800
Subject: [PATCH v1 1/1] introduce pg_buffercache_usage_counts()

---
 .../expected/pg_buffercache.out   |  14 +++
 .../pg_buffercache--1.3--1.4.sql  |  13 +++
 contrib/pg_buffercache/pg_buffercache_pages.c |  46 
 contrib/pg_buffercache/sql/pg_buffercache.sql |   4 +
 doc/src/sgml/pgbuffercache.sgml   | 101 +-
 5 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_buffercache/expected/pg_buffercache.out b/contrib/pg_buffercache/expected/pg_buffercache.out
index 635f01e3b2..b745dc69ea 100644
--- a/contrib/pg_buffercache/expected/pg_buffercache.out
+++ b/contrib/pg_buffercache/expected/pg_buffercache.out
@@ -17,6 +17,12 @@ from pg_buffercache_summary();
  t| t| t
 (1 row)
 
+SELECT count(*) > 0 FROM pg_buffercache_usage_counts() WHERE buffers >= 0;
+ ?column? 
+--
+ t
+(1 row)
+
 -- Check that the functions / views can't be accessed by default. To avoid
 -- having to create a dedicated user, use the pg_database_owner pseudo-role.
 SET ROLE pg_database_owner;
@@ -26,6 +32,8 @@ SELECT * FROM pg_buffercache_pages() AS p (wrong int);
 ERROR:  permission denied for function pg_buffercache_pages
 SELECT * FROM pg_buffercache_summary();
 ERROR:  permission denied for function pg_buffercache_summary
+SELECT * FROM pg_buffercache_usage_counts();
+ERROR:  permission denied for function pg_buffercache_usage_counts
 RESET role;
 -- Check that pg_monitor is allowed to query view / function
 SET ROLE pg_monitor;
@@ -41,3 +49,9 @@ SELECT buffers_used + buffers_unused > 0 FROM pg_buffercache_summary();
  t
 (1 row)
 
+SELECT count(*) > 0 FROM pg_buffercache_usage_counts();
+ ?column? 
+--
+ t
+(1 row)
+
diff --git a/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql b/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
index 8f212dc5e9..f4702e4b4b 100644
--- a/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
+++ b/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
@@ -15,3 +15,16 @@ LANGUAGE C PARALLEL SAFE;
 -- Don't want these to be available to public.
 REVOKE ALL ON FUNCTION pg_buffercache_summary() FROM PUBLIC;
 GRANT EXECUTE ON FUNCTION pg_buffercache_summary() TO pg_monitor;
+
+CREATE FUNCTION pg_buffercache_usage_counts(
+OUT usage_count int4,
+OUT buffers int4,
+OUT dirty int4,
+OUT pinned int4)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_buffercache_usage_counts'
+LANGUAGE C PARALLEL SAFE;
+
+-- Don't want these to be available to public.
+REVOKE ALL ON FUNCTION pg_buffercache_usage_counts() FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_buffercache_usage_counts() TO pg_monitor;
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 1c6a2f22ca..f333967c51 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -18,6 +18,7 @@
 #define NUM_BUFFERCACHE_PAGES_MIN_ELEM	8
 #define NUM_BUFFERCACHE_PAGES_ELEM	9
 #define NUM_BUFFERCACHE_SUMMARY_ELEM 5
+#define NUM_BUFFERCACHE_USAGE_COUNTS_ELEM 4
 
 PG_MODULE_MAGIC;
 
@@ -61,6 +62,7 @@ typedef struct
  */
 PG_FUNCTION_INFO_V1(pg_buffercache_pages);
 PG_FUNCTION_INFO_V1(pg_buffercache_summary);
+PG_FUNCTION_INFO_V1(pg_buffercache_usage_counts);
 
 Datum
 pg_buffercache_pages(PG_FUNCTION_ARGS)
@@ -304,3 +306,47 @@ pg_buffercache_summary(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(result);
 }
+
+Datum
+pg_buffercache_usage_counts(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	int			usage_counts[BM_MAX_USAGE_COUNT + 1] = {0};
+	int			dirty[BM_MAX_USAGE_COUNT + 1] = {0};
+	int			pinned[BM_MAX_USAGE_COUNT + 1] = {0};
+

Re: Making background psql nicer to use in tap tests

2023-01-30 Thread Andres Freund
Hi,

On 2023-01-30 15:06:46 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > It's annoyingly hard to wait for the result of a query in a generic way with
> > background_psql(), and more generally for psql. background_psql() uses 
> > -XAtq,
> > which means that we'll not get "status" output (like "BEGIN" or "(1 row)"),
> > and that queries not returning anything are completely invisible.
>
> Yeah, the empty-query-result problem was giving me fits recently.
> +1 for wrapping this into something more convenient to use.

I've hacked some on this. I first tried to just introduce a few helper
functions in Cluster.pm, but that ended up being awkward. So I bit the bullet
and introduced a new class (in BackgroundPsql.pm), and made background_psql()
and interactive_psql() return an instance of it.

This is just a rough prototype. Several function names don't seem great, it
need POD documentation, etc.


The main convenience things it has over the old interface:
- $node->background_psql('dbname') is enough
- $psql->query(), which returns the query results as a string, is a lot easier
  to use than having to pump, identify query boundaries via regex etc.
- $psql->query_safe(), which dies if any query fails (detected via stderr)
- $psql->query_until() is a helper that makes it a bit easier to start queries
  that won't finish until a later point


I don't quite like the new interface yet:
- It's somewhat common to want to know if there was a failure, but also get
  the query result, not sure what the best function signature for that is in
  perl.
- query_until() sounds a bit too much like $node->poll_query_until(). Maybe
  query_wait_until() is better? OTOH, the other function has poll in the name,
  so maybe it's ok.
- right now there's a bit too much logic in background_psql() /
  interactive_psql() for my taste


Those points aside, I think it already makes the tests a good bit more
readable. My WIP vacuum_defer_cleanup_age patch shrunk by half with it.

I think with a bit more polish it's easy enough to use that we could avoid a
good number of those one-off psql's that we do all over.


I didn't really know what this, insrc/test/subscription/t/015_stream.pl, is
about:

$h->finish;# errors make the next test fail, so ignore them here

There's no further test?

I'm somewhat surprised it doesn't cause problems in another ->finish later on,
where we then afterwards just use $h again. Apparently IPC::Run just
automagically restarts psql?


Greetings,

Andres Freund
>From fb0e9fceead01aaee0bdd05c3ad6c67814f6b820 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 30 Jan 2023 15:39:08 -0800
Subject: [PATCH v1] WIP: test: Introduce BackgroundPsql class

Discussion: https://postgr.es/m/882122.1675109...@sss.pgh.pa.us
---
 src/bin/psql/t/010_tab_completion.pl  |  27 +--
 contrib/amcheck/t/003_cic_2pc.pl  |  66 +++
 .../perl/PostgreSQL/Test/BackgroundPsql.pm| 165 ++
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  83 ++---
 src/test/recovery/t/031_recovery_conflict.pl  | 101 +++
 src/test/subscription/t/015_stream.pl |  52 ++
 6 files changed, 252 insertions(+), 242 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/BackgroundPsql.pm

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index 7746c75e0c9..7c382b98f8f 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -92,14 +92,7 @@ print $FH "other stuff\n";
 close $FH;
 
 # fire up an interactive psql session
-my $in  = '';
-my $out = '';
-
-my $timer = timer($PostgreSQL::Test::Utils::timeout_default);
-
-my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
-
-like($out, qr/psql/, "print startup banner");
+my $h = $node->interactive_psql('postgres');
 
 # Simple test case: type something and see if psql responds as expected
 sub check_completion
@@ -109,15 +102,12 @@ sub check_completion
 	# report test failures from caller location
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	# reset output collector
-	$out = "";
 	# restart per-command timer
-	$timer->start($PostgreSQL::Test::Utils::timeout_default);
-	# send the data to be sent
-	$in .= $send;
-	# wait ...
-	pump $h until ($out =~ $pattern || $timer->is_expired);
-	my $okay = ($out =~ $pattern && !$timer->is_expired);
+	$h->{timeout}->start($PostgreSQL::Test::Utils::timeout_default);
+
+	# send the data to be sent and wait for its result
+	my $out = $h->query_until($pattern, $send);
+	my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
 	ok($okay, $annotation);
 	# for debugging, log actual output if it didn't match
 	local $Data::Dumper::Terse = 1;
@@ -443,10 +433,7 @@ check_completion("blarg \t\t", qr//, "check completion failure path");
 clear_query();
 
 # send psql an explicit \q to shut it down, else pty won't close properly
-$timer->start($PostgreSQL::Test::Utils::timeout_default);
-$in .= "\\q\n";
-fin

Re: generate_series for timestamptz and time zone problem

2023-01-30 Thread Tom Lane
Gurjeet Singh  writes:
> [ generate_series_with_timezone.v6.patch ]

The cfbot isn't terribly happy with this.  It looks like UBSan
is detecting some undefined behavior.  Possibly an uninitialized
variable?

regards, tom lane




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-30 Thread Peter Smith
On Mon, Jan 30, 2023 at 5:23 PM houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 30, 2023 12:13 PM Peter Smith  
> wrote:
> >
> > Here are my review comments for v88-0002.
>
> Thanks for your comments.
>
> >
> > ==
> > General
> >
> > 1.
> > The test cases are checking the log content but they are not checking for
> > debug logs or untranslated elogs -- they are expecting a normal ereport LOG
> > that might be translated. I’m not sure if that is OK, or if it is a 
> > potential problem.
>
> We have tests that check the ereport ERROR and ereport WARNING message(by
> search for the ERROR or WARNING keyword for all the tap tests), so I think
> checking the LOG should be fine.
>
> > ==
> > doc/src/sgml/config.sgml
> >
> > 2.
> > On the publisher side, logical_replication_mode allows allows streaming or
> > serializing changes immediately in logical decoding. When set to immediate,
> > stream each change if streaming option (see optional parameters set by
> > CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set
> > to buffered, the decoding will stream or serialize changes when
> > logical_decoding_work_mem is reached.
> >
> > 2a.
> > typo "allows allows"  (Kuroda-san reported same)
> >
> > 2b.
> > "if streaming option" --> "if the streaming option"
>
> Changed.

Although you replied "Changed" for the above, AFAICT my review comment
#2b. was accidentally missed.

Otherwise, the patch LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Introduce "log_connection_stages" setting.

2023-01-30 Thread Justin Pryzby
Thanks for updating the patch.  It's currently failing check-world, due
to a test that was added on January 23 (a9dc7f941):

http://cfbot.cputube.org/sergey-dudoladov.html
[19:15:57.101] Summary of Failures:
[19:15:57.101] [19:15:57.101] 250/251 postgresql:ldap / ldap/002_bindpasswd 
ERROR  1.38s

2023-01-30 19:15:52.427 GMT [56038] LOG:  unrecognized configuration parameter 
"log_connections" in file 
"/tmp/cirrus-ci-build/build/testrun/ldap/002_bindpasswd/data/t_002_bindpasswd_node_data/pgdata/postgresql.conf"
 line 839

> +received, but no further work has been done: Postgres is about to 
> start

say "PostgreSQL" to match the rest of the docs.

> +   GUC_check_errmsg("Invalid value '%s'", stage);

This message shouldn't be uppercased.

> + GUC_check_errdetail("Valid values to use in the list 
> are 'received', 'authenticated', 'authorized', 'disconnected', and 'all'."
> + "If 'all' is present, it must be the only value in the 
> list.");

Maybe "all" should be first ?

There's no spaces before "If":

| 2023-01-31 00:17:48.906 GMT [5676] DETALLE:  Valid values to use in the list 
are 'received', 'authenticated', 'authorized', 'disconnected', and 'all'.If 
'all' is present, it must be the only value in the list.

> +/* flags for logging information about session state */
> +int  Log_connection_messages = LOG_CONNECTION_ALL;

The initial value here is overwritten by the GUC default during startup.
For consistency, the integer should be initialized to 0.

> +extern PGDLLIMPORT int   Log_connection_messages;
> +
> +/* Bitmap for logging connection messages */
> +#define LOG_CONNECTION_RECEIVED   1
> +#define LOG_CONNECTION_AUTHENTICATED 2
> +#define LOG_CONNECTION_AUTHORIZED 4
> +#define LOG_CONNECTION_DISCONNECTED  8
> +#define LOG_CONNECTION_ALL15

Maybe the integers should be written like (1<<0)..
And maybe ALL should be 0x ?

More nitpicks:

> +Causes stages of each attempted connection to the server to be 
> logged. Example: authorized,disconnected.

"Causes the specified stages of each connection attempt .."

> +The default is the empty string meaning nothing is logged.

".. string, which means that nothing is logged"

> +Logs the original identity that an authentication method 
> employs
> +to identify a user. In most cases, the identity string matches the

".. original identity used by an authentication method ..'

-- 
Justin




Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-30 Thread Robins Tharakan
Hi Andres,

On Tue, 31 Jan 2023 at 06:31, Nathan Bossart  wrote:
>
> On Mon, Jan 30, 2023 at 11:49:37AM -0800, Andres Freund wrote:
> > Why don't the dblink tests catch this?  Any chance you or Robins could 
> > prepare
> > a patch with fix and test, given that you know how to trigger this?
>
> It's trivially reproducible by calling 1-argument dblink_connect() multiple
> times and then calling dblink_disconnect().  Here's a patch.
>

My test instance has been running Nathan's patch for the past
few hours, and it looks like this should resolve the issue.

A little bit of background, since the past few days I was
noticing frequent crashes on a test instance. They're not simple
to reproduce manually, but if left running I can reliably see
crashes on an ~hourly basis.

In trying to trace the origin, I ran a multiple-hour test for
each commit going back a few commits and noticed that the
crashes stopped prior to commit e4602483e9, which is when the
issue became visible.

The point is now moot, but let me know if full backtraces still
help (I was concerned looking at the excerpts from some
of the crashes):

"double free or corruption (out)"
"munmap_chunk(): invalid pointer"
"free(): invalid pointer"
str->data[0] = '\0';


Some backtraces
###

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 6c6d6ba3ee@master@sqith: u21 postgres
127.0.0.1(45334) SELECT   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI___libc_free (mem=0x312f352f65736162) at malloc.c:3102
#0  __GI___libc_free (mem=0x312f352f65736162) at malloc.c:3102
#1  0x7fc0062dfefd in pqDropConnection (conn=0x564bb3e1a080,
flushInput=true) at fe-connect.c:495
#2  0x7fc0062e5cb3 in closePGconn (conn=0x564bb3e1a080)
at fe-connect.c:4112
#3  0x7fc0062e5d55 in PQfinish (conn=0x564bb3e1a080) at fe-connect.c:4134
#4  0x7fc0061a442b in libpqsrv_disconnect (conn=0x564bb3e1a080)
at ../../src/include/libpq/libpq-be-fe-helpers.h:117
#5  0x7fc0061a4df1 in dblink_disconnect (fcinfo=0x564bb3d980f0)
at dblink.c:357
#6  0x564bb0e70aa7 in ExecInterpExpr (state=0x564bb3d98018,
econtext=0x564bb3d979a0, isnull=0x7ffd60824b0f) at execExprInterp.c:728
#7  0x564bb0e72f36 in ExecInterpExprStillValid (state=0x564bb3d98018,
econtext=0x564bb3d979a0, isNull=0x7ffd60824b0f) at execExprInterp.c:1838




Core was generated by `postgres: 6c6d6ba3ee@master@sqith: u52 postgres
127.0.0.1(58778) SELECT   '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7fc021792859 in __GI_abort () at abort.c:79
#2  0x7fc0217fd26e in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7fc021927298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x7fc0218052fc in malloc_printerr (
str=str@entry=0x7fc021929670 "double free or corruption (out)")
at malloc.c:5347
#4  0x7fc021806fa0 in _int_free (av=0x7fc02195cb80 ,
p=0x7fc02195cbf0 , have_lock=)
at malloc.c:4314
#5  0x7fc0062e16ed in freePGconn (conn=0x564bb6e36b80)
at fe-connect.c:3977
#6  0x7fc0062e1d61 in PQfinish (conn=0x564bb6e36b80) at fe-connect.c:4135
#7  0x7fc0062a142b in libpqsrv_disconnect (conn=0x564bb6e36b80)
at ../../src/include/libpq/libpq-be-fe-helpers.h:117
#8  0x7fc0062a1df1 in dblink_disconnect (fcinfo=0x564bb5647b58)
at dblink.c:357



Program terminated with signal SIGSEGV, Segmentation fault.
#0  resetPQExpBuffer (str=0x559d3af0e838) at pqexpbuffer.c:153
153 str->data[0] = '\0';
#0  resetPQExpBuffer (str=0x559d3af0e838) at pqexpbuffer.c:153
#1  0x7f2240b0a876 in PQexecStart (conn=0x559d3af0e410) at fe-exec.c:2320
#2  0x7f2240b0a688 in PQexec (conn=0x559d3af0e410,
query=0x559d56fb8ee8 "p3") at fe-exec.c:2227
#3  0x7f223ba8c7e4 in dblink_exec (fcinfo=0x559d3b101f58) at dblink.c:1432
#4  0x559d2f003c82 in ExecInterpExpr (state=0x559d3b101ac0,
econtext=0x559d34e76578, isnull=0x7ffe3d590fdf) at execExprInterp.c:752




Core was generated by `postgres: 728f86fec6@(HEAD detached at
728f86fec6)@sqith: u19 postgres 127.0.0.'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f96f5fc6e99 in SSL_shutdown ()
   from /lib/x86_64-linux-gnu/libssl.so.1.1
#0  0x7f96f5fc6e99 in SSL_shutdown ()
   from /lib/x86_64-linux-gnu/libssl.so.1.1
#1  0x7f96da56027a in pgtls_close (conn=0x55d919752fb0)
at fe-secure-openssl.c:1555
#2  0x7f96da558e41 in pqsecure_close (conn=0x55d919752fb0)
at fe-secure.c:192
#3  0x7f96da53dd12 in pqDropConnection (conn=0x55d919752fb0,
flushInput=true) at fe-connect.c:449
#4  0x7f96da543cb3 in closePGconn (conn=0x55d919752fb0)
at fe-connect.c:4112
#5  0x7f96da543d55 in PQfinish (conn=0x55d919752fb0) at fe-connect.c:

Re: pg_upgrade test failure

2023-01-30 Thread Thomas Munro
On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro  wrote:
> On Wed, Dec 7, 2022 at 7:15 AM Andres Freund  wrote:
> > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > > So [1] on its own didn't fix this.  My next guess is that the attached
> > > might help.
>
> > What is our plan here? This afaict is the most common "false positive" for
> > cfbot in the last weeks.

I pushed the rmtree() change.  Let's see if that helps, or tells us
something new.

Michael:  There were some questions from Andres above.  FWIW I think
if you wanted to investigate this properly on a local Windows system
to chase down who's got the file open (shutdown sequence problem or
whatever), you'd probably have to install Server 2019, or maybe use an
old 8.1 VM if you still have such a thing, based on the suspicion that
typical 10 and 11 systems won't exhibit the problem.  But then I could
be wrong about what's going on...




Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-01-30 Thread Anton A. Melnikov

Hello!

On 24.11.2022 04:02, Thomas Munro wrote:

On Thu, Nov 24, 2022 at 11:05 AM Tom Lane  wrote:

Thomas Munro  writes:


ERROR:  calculated CRC checksum does not match value stored in file

The attached draft patch fixes it.


Tried to catch this error on my PC, but failed to do it within a reasonable 
time.
The 1s interval is probably too long for me.
It seems there are more durable way to reproduce this bug with 0001 patch 
applied:
At the first backend:

do $$ begin loop perform pg_update_control_file(); end loop; end; $$;

At the second one:

do $$ begin loop perform pg_control_system(); end loop; end; $$;

It will fails almost immediately with:
"ERROR:  calculated CRC checksum does not match value stored in file"
both with fsync = off and fsync = on.
Checked it out for master and REL_11_STABLE.

Also checked for a few hours that the patch 0002 fixes this error,
but there are some questions to its logical structure.
The equality between the previous and newly calculated crc is checked only
if the last crc calculation was wrong, i.e not equal to the value stored in the 
file.
It is very unlikely that in this case the previous and new crc can match, so, 
in fact,
the loop will spin until crc is calculated correctly. In the other words,
this algorithm is practically equivalent to an infinite loop of reading from a 
file
and calculating crc while(EQ_CRC32C(crc, ControlFile->crc) != true).
But then it can be simplified significantly by removing checksums equality 
checks,
bool fist_try and by limiting the maximum number of iterations
with some constant in the e.g. for loop to avoid theoretically possible freeze.

Or maybe use the variant suggested by Tom Lane, i.e, as far as i understand,
repeat the file_open-read-close-calculate_crc sequence twice without a pause 
between
them and check the both calculated crcs for the equality. If they match, exit 
and return
the bool result of comparing between the last calculation with the value from 
the file,
if not, take a pause and repeat everything from the beginning.
In this case, no need to check *crc_ok_p inside get_controlfile()
as it was in the present version; i think it's more logically correct
since this variable is intended top-level functions and the logic
inside get_controlfile() should not depend on its value.

Also found a warning in 0001 patch for master branch. On my PC gcc gives:

xlog.c:2507:1: warning: no previous prototype for ‘pg_update_control_file’ 
[-Wmissing-prototypes]
 2507 | pg_update_control_file()

Fixed it with #include "utils/fmgrprotos.h" to xlog.c and
add PG_FUNCTION_ARGS to pg_update_control_file().

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-30 Thread Masahiko Sawada
On Mon, Jan 30, 2023 at 8:24 PM Amit Kapila  wrote:
>
> On Thu, Dec 8, 2022 at 8:17 AM Masahiko Sawada  wrote:
> >
> > On Mon, Nov 21, 2022 at 4:31 PM Amit Kapila  wrote:
> >
> > One idea to fix this issue is that in
> > ReplicationSlotsComputeRequiredXmin(), we compute the minimum xmin
> > while holding both ProcArrayLock and ReplicationSlotControlLock, and
> > release only ReplicationSlotsControlLock before updating the
> > replication_slot_xmin. I'm concerned it will increase the contention
> > on ProcArrayLock but I've attached the patch for discussion.
> >
>
> But what kind of workload are you worried about? This will be called
> while processing XLOG_RUNNING_XACTS to update
> procArray->replication_slot_xmin/procArray->replication_slot_catalog_xmin
> only when required. So, if we want we can test some concurrent
> workloads along with walsenders doing the decoding to check if it
> impacts performance.
>

I was slightly concerned about holding ProcArrayLock while iterating
over replication slots especially when there are many replication
slots in the system. But you're right; we need it only when processing
XLOG_RUNINNG_XACTS and it's not frequent. So it doesn't introduce
visible overhead or negligible overhead.

> What other way we can fix this? Do you think we can try to avoid
> retreating xmin values in ProcArraySetReplicationSlotXmin() to avoid
> this problem? Personally, I think taking the lock as proposed by your
> patch is a better idea.

Agreed.

> BTW, this problem seems to be only logical
> replication specific, so if we are too worried then we can change this
> locking only for logical replication.

Yes, but I agree that there won't be a big overhead by this fix.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Scan buffercache for a table

2023-01-30 Thread Amin
Hi,

I am looking for function calls to scan the buffer cache for a table and
find the cached pages. I want to find out which pages are cached and which
of them are dirty. Having the relation id, how can I do that? I have gone
through bufmgr.c and relcache.c, but could not find a way to get
relation-specific pages from the buffer cache.

Thank you!


Re: Allow an extention to be updated without a script

2023-01-30 Thread Yugo NAGATA
Hi,

Thank you for your comment.

On Mon, 30 Jan 2023 16:05:52 -0500
Tom Lane  wrote:

> Yugo NAGATA  writes:
> > Currently, even when we don't need to execute any command to update an
> > extension from one version to the next, we need to provide an update
> > script that doesn't contain any command. Preparing such meaningless
> > files are sometimes annoying.
> 
> If you have no update script, why call it a new version?  The point
> of extension versions is to distinguish different states of the
> extension's SQL objects.  We do not consider mods in underlying C code
> to justify a new version.

Although, as you say, the extension manager doesn't track changes in C code
functions, a new version could be released with changes in only in C
functions that implement a new feature or fix some bugs because it looks
a new version from user's view.  Actually, there are several extensions
that provide empty update scripts in order to allow user to install such
new versions, for example;

- pglogical
 
(https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical--2.4.1--2.4.2.sql)
- hll
 
(https://github.com/citusdata/postgresql-hll/blob/master/update/hll--2.16--2.17.sql)
- orafce
 (https://github.com/orafce/orafce/blob/master/orafce--3.12--3.13.sql)
- hypopg
 (https://github.com/HypoPG/hypopg/blob/REL1_STABLE/hypopg--1.3.1--1.3.2.sql)
- timescaledb
 
(https://github.com/timescale/timescaledb/blob/main/sql/updates/2.9.2--2.9.1.sql)


-- 
Yugo NAGATA 




Re: Scan buffercache for a table

2023-01-30 Thread Justin Pryzby
On Mon, Jan 30, 2023 at 06:01:08PM -0800, Amin wrote:
> Hi,
> 
> I am looking for function calls to scan the buffer cache for a table and
> find the cached pages. I want to find out which pages are cached and which
> of them are dirty. Having the relation id, how can I do that? I have gone
> through bufmgr.c and relcache.c, but could not find a way to get
> relation-specific pages from the buffer cache.

This looks like a re-post of the question you asked on Jan 13:
caf-ka8_axsmpqw1scotnaqx8nfhgmjc6l87qzao3jezlibu...@mail.gmail.com
It'd be better not to start a new thread (or if you do that, it'd be
better to mention the old one and include its participants).

On Fri, Jan 13, 2023 at 05:28:31PM -0800, Amin wrote:
> Hi,
> 
> Before scanning a relation, in the planner stage, I want to make a
> call to
> retrieve information about how many pages will be a hit for a specific
> relation. The module pg_buffercache seems to be doing a similar thing.

The planner is a *model* which (among other things) tries to guess how
many pages will be read/hit.  It's not expected to be anywhere near
accurate.

pg_buffercache only checks for pages within postgres' own buffer cache.
It doesn't look for pages which are in the OS page cache, which require
a system call to access (but don't require device I/O).

Read about pgfincore for introinspection of the OS page cache.

> Also, pg_statio_all_tables seems to be having that information, but it
> is updated after execution. However, I want the information before
> execution.  Also not sure how pg_statio_all_tables is created and how
> I can access it in the code.

But the view isn't omnicient.  When you execute a plan, you don't know
how it's going to end.  If you did, you wouldn't need to run it - you
could just print the answer.

Note that planning and execution are separate and independant.  It's
possible to plan a query without ever running it, or to plan it once and
run it multiple times.  The view reflects I/O requested by postgres; the
I/O normally comes primarily from execution.

You can look at how the view is defined:
\sv pg_statio_all_tables

And then you can look at how the functions that it calls are implemented
(\df+).  Same for pg_buffercache.  It seems like you'll want to learn
how to navigate the source code to find how things are connected.

-- 
Justin




RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-30 Thread houzj.f...@fujitsu.com
On Monday, January 30, 2023 10:20 PM Masahiko Sawada  
wrote:
> 
> 
> I have one comment on v89 patch:
> 
> +   /*
> +* Using 'immediate' mode returns false to cause a switch to
> +* PARTIAL_SERIALIZE mode so that the remaining changes will
> be serialized.
> +*/
> +   if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE)
> +   return false;
> +
> 
> Probably we might want to add unlikely() here since we could pass through this
> path very frequently?

I think your comment makes sense, thanks.
I updated the patch for the same.

Best Regards,
Hou zj


v90-0001-Extend-the-logical_replication_mode-to-test-the-.patch
Description:  v90-0001-Extend-the-logical_replication_mode-to-test-the-.patch


  1   2   >