Re: bogus: logical replication rows/cols combinations

2022-04-30 Thread Alvaro Herrera
On 2022-Apr-28, Tomas Vondra wrote:

> Attached is a patch doing the same thing in tablesync. The overall idea
> is to generate copy statement with CASE expressions, applying filters to
> individual columns. For Alvaro's example, this generates something like
> 
>   SELECT
> (CASE WHEN (a < 0) OR (a > 0) THEN a ELSE NULL END) AS a,
> (CASE WHEN (a > 0) THEN b ELSE NULL END) AS b,
> (CASE WHEN (a < 0) THEN c ELSE NULL END) AS c
>   FROM uno WHERE (a < 0) OR (a > 0)

I've been reading the tablesync.c code you propose and the idea seems
correct.  (I was distracted by wondering if a different data structure
would be more appropriate, because what's there looks slightly
uncomfortable to work with.  But after playing around I can't find
anything that feels better in an obvious way.)

(I confess I'm a bit bothered by the fact that there are now three
different data structures in our code called PublicationInfo).

I propose some comment changes in the attached patch, and my
interpretation (untested) of the idea of optimizing for a single
publication.  (In there I also rename logicalrep_relmap_free_entry
because it's confusing.  That should be a separate patch but I didn't
split it before posting, apologies.)

> There's a couple options how we might optimize this for common cases.
> For example if there's just a single publication, there's no need to
> generate the CASE expressions - the WHERE filter will do the trick.

Right.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From b8abd662b36aa588a7566fa7da3a2b1cd78d5e51 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 29 Apr 2022 17:41:34 +0200
Subject: [PATCH 1/2] Change some comments

---
 src/backend/replication/logical/relation.c  | 15 ++---
 src/backend/replication/logical/tablesync.c | 72 -
 2 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..9307a6bf1d 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -119,14 +119,13 @@ logicalrep_relmap_init(void)
 }
 
 /*
- * Free the entry of a relation map cache.
+ * Clear the contents of a LogicalRepRelMapEntry, freeing any subsidiary
+ * allocated members.  The entry itself is not freed.
  */
 static void
-logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
+logicalrep_relmap_clear_entry(LogicalRepRelMapEntry *entry)
 {
-	LogicalRepRelation *remoterel;
-
-	remoterel = &entry->remoterel;
+	LogicalRepRelation *remoterel = &entry->remoterel;
 
 	pfree(remoterel->nspname);
 	pfree(remoterel->relname);
@@ -165,13 +164,13 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 		logicalrep_relmap_init();
 
 	/*
-	 * HASH_ENTER returns the existing entry if present or creates a new one.
+	 * Find an hashtable entry for this rel, or create a new one.  If there
+	 * was one, clear its contents so we can fill it fresh.
 	 */
 	entry = hash_search(LogicalRepRelMap, (void *) &remoterel->remoteid,
 		HASH_ENTER, &found);
-
 	if (found)
-		logicalrep_relmap_free_entry(entry);
+		logicalrep_relmap_clear_entry(entry);
 
 	memset(entry, 0, sizeof(LogicalRepRelMapEntry));
 
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index fd547e16f4..82ee406eb4 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1100,15 +1100,14 @@ copy_table(Relation rel)
 	}
 	else
 	{
+		List	   *quals;
+		ListCell   *lc;
+
 		/*
 		 * For non-tables and tables with row filters, we need to do COPY
 		 * (SELECT ...), but we can't just do SELECT * because we need to not
 		 * copy generated columns. For tables with any row filters, build a
 		 * SELECT query with OR'ed row filters for COPY.
-		 *
-		 * FIXME can be simplified if all subscriptions have the same column
-		 * list (or no column list), in which case we don't need the CASE
-		 * expressions at all.
 		 */
 		appendStringInfoString(&cmd, "COPY (SELECT ");
 		for (int i = 0; i < lrel.natts; i++)
@@ -1146,7 +1145,17 @@ copy_table(Relation rel)
 	appendStringInfo(&qual, " OR %s", strVal(pubinfo->rowfilter));
 			}
 
-			if (no_filter)
+			/*
+			 * If there's no row filter (which includes the case of there
+			 * being several publications of which at least one does not have
+			 * a row filter), or there is a single publication, then we can
+			 * emit the column name itself (the latter is possible because the
+			 * WHERE clause will take care of omitting these rows; this works
+			 * only with a single publication.)  If there are several
+			 * publications, then emit a CASE expression that only prints the
+			 * column if the row meets the row filter of any publications.
+			 */
+			if (no_filter || list_length(pubinfos) == 1)
 appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
 			else
 appendString

Re: bogus: logical replication rows/cols combinations

2022-04-30 Thread Alvaro Herrera
On 2022-Apr-30, Amit Kapila wrote:

> On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
>  wrote:

> > That seems to deal with a circular replication, i.e. two logical
> > replication links - a bit like a multi-master. Not sure how is that
> > related to the issue we're discussing here?
> 
> It is not directly related to what we are discussing here but I was
> trying to emphasize the point that users need to define the logical
> replication via pub/sub sanely otherwise they might see some weird
> behaviors like that.

I agree with that.

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown.  Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: bogus: logical replication rows/cols combinations

2022-04-30 Thread Amit Kapila
On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera  wrote:
>
> On 2022-Apr-30, Amit Kapila wrote:
>
> > On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
> >  wrote:
>
> > > That seems to deal with a circular replication, i.e. two logical
> > > replication links - a bit like a multi-master. Not sure how is that
> > > related to the issue we're discussing here?
> >
> > It is not directly related to what we are discussing here but I was
> > trying to emphasize the point that users need to define the logical
> > replication via pub/sub sanely otherwise they might see some weird
> > behaviors like that.
>
> I agree with that.
>
> My proposal is that if users want to define multiple publications, and
> their definitions conflict in a way that would behave ridiculously (==
> bound to cause data inconsistencies eventually), an error should be
> thrown.  Maybe we will not be able to catch all bogus cases, but we can
> be prepared for the most obvious ones, and patch later when we find
> others.
>

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1K%2BPkkC6_FDemGMC_i%2BAakx%2B3%3DQG-g4We3BdCK7dK_bgA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Hash index build performance tweak from sorting

2022-04-30 Thread Amit Kapila
On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs
 wrote:
>
> Hash index pages are stored in sorted order, but we don't prepare the
> data correctly.
>
> We sort the data as the first step of a hash index build, but we
> forget to sort the data by hash as well as by hash bucket.
>

I was looking into the nearby comments (Fetch hash keys and mask off
bits we don't want to sort by.) and it sounds like we purposefully
don't want to sort by the hash key. I see that this comment was
originally introduced in the below commit:

commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b
Author: Tom Lane 
Date:   Mon Sep 15 18:43:41 2008 +

Change hash indexes to store only the hash code rather than the
whole indexed
value.

But even before that, we seem to mask off the bits before comparison.
Is it that we are doing so because we want to keep the order of hash
keys in a particular bucket so such masking was required? If so, then
maybe it is okay to compare the hash keys as you are proposing once we
find that the values fall in a particular bucket. Another thing to
note is that this code was again changed in ea69a0dead but it seems to
be following the intent of the original code.

Few comments on the patch:
1. I think it is better to use DatumGetUInt32 to fetch the hash key as
the nearby code is using.
2. You may want to change the below comment in HSpool
/*
* We sort the hash keys based on the buckets they belong to. Below masks
* are used in _hash_hashkey2bucket to determine the bucket of given hash
* key.
*/

>
> Aside:
>
> I'm not very sure why tuplesort has private code in it dedicated to
> hash indexes, but it does.
>

Are you talking about
tuplesort_begin_index_hash/comparetup_index_hash? I see the
corresponding functions for btree as well in that file.

> Even more strangely, the Tuplesortstate fixes the size of max_buckets
> at tuplesort_begin() time rather than tuplesort_performsort(), forcing
> us to estimate the number of tuples ahead of time rather than using
> the exact number. Next trick would be to alter the APIs to allow exact
> values to be used for sorting, which would allow page at a time
> builds.
>

It is not clear to me what exactly you want to do here but maybe it is
a separate topic and we should discuss this separately.

-- 
With Regards,
Amit Kapila.




Re: Building Postgres with lz4 on Visual Studio

2022-04-30 Thread Andrew Dunstan


On 2022-04-29 Fr 08:50, Andrew Dunstan wrote:
> What I did was to install the packages using vcpkg[1] which is a
> standard framework created by Microsoft for installing package
> libraries. It does install the .lib files in a sane place
> (installdir/lib), but it doesn't use the lib suffix. Also it names the
> lib file for zlib differently.
>

Of course I meant "doesn't use the 'lib' prefix". So the files are named
just zstd.lib and lz4.lib.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Switching XLog source from archive to streaming when primary available

2022-04-30 Thread Bharath Rupireddy
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> Hi Hackers,
>
> When the standby couldn't connect to the primary it switches the XLog source 
> from streaming to archive and continues in that state until it can get the 
> WAL from the archive location. On a server with high WAL activity, typically 
> getting the WAL from the archive is slower than streaming it from the primary 
> and couldn't exit from that state. This not only increases the lag on the 
> standby but also adversely impacts the primary as the WAL gets accumulated, 
> and vacuum is not able to collect the dead tuples. DBAs as a mitigation can 
> however remove/advance the slot or remove the restore_command on the standby 
> but this is a manual work I am trying to avoid. I would like to propose the 
> following, please let me know your thoughts.
>
> Automatically attempt to switch the source from Archive to streaming when the 
> primary_conninfo is set after replaying 'N' wal segment governed by the GUC 
> retry_primary_conn_after_wal_segments
> when  retry_primary_conn_after_wal_segments is set to -1 then the feature is 
> disabled
> When the retry attempt fails, then switch back to the archive

I've gone through the state machine in WaitForWALToBecomeAvailable and
I understand it this way: failed to receive WAL records from the
primary causes the current source to switch to archive and the standby
continues to get WAL records from archive location unless some failure
occurs there the current source is never going to switch back to
stream. Given the fact that getting WAL from archive location causes
delay in production environments, we miss to take the advantage of the
reconnection to primary after previous failed attempt.

So basically, we try to attempt to switch to streaming from archive
(even though fetching from archive can succeed) after a certain amount
of time or WAL segments. I prefer timing-based switch to streaming
from archive instead of after a number of WAL segments fetched from
archive. Right now, wal_retrieve_retry_interval is being used to wait
before switching to archive after failed attempt from streaming, IMO,
a similar GUC (that gets set once the source switched from streaming
to archive and on timeout it switches to streaming again) can be used
to switch from archive to streaming after the specified amount of
time.

Thoughts?

Regards,
Bharath Rupireddy.




Re: Missing can't-assign-to-constant checks in plpgsql

2022-04-30 Thread Tom Lane
Pavel Stehule  writes:
> čt 28. 4. 2022 v 23:52 odesílatel Tom Lane  napsal:
>> Perhaps the OPEN change is a little too aggressive, since if
>> you give the refcursor variable some non-null initial value,
>> OPEN won't change it; in that usage a CONSTANT marking could
>> be allowed.  But I really seriously doubt that anybody out
>> there is marking such variables as constants, so I thought
>> throwing the error at compile time was better than postponing
>> it to runtime so we could handle that.
>> 
>> Regardless of which way we handle that point, I'm inclined to
>> change this only in HEAD.  Probably people wouldn't thank us
>> for making the back branches more strict.

> +1

After sleeping on it, I got cold feet about breaking arguably
legal code, so I made OPEN check at runtime instead.  Which
was probably a good thing anyway, because it made me notice
that exec_stmt_forc() needed a check too.  AFAICS there are no
other places in pl_exec.c that are performing assignments to
variables not checked at parse time.

Pushed that way.

regards, tom lane




Re: pgsql: Add contrib/pg_walinspect.

2022-04-30 Thread Jeff Davis
On Fri, 2022-04-29 at 10:46 +0530, Bharath Rupireddy wrote:
> It's not just the flush ptr, without no wait mode, the functions
> would
> wait if start/input lsn is, say current flush lsn - 1 or 2 or more
> (before the previous record) bytes. If the functions were to wait, by
> how much time should they wait? a timeout? forever?

I see, you're talking about the case of XLogFindNextRecord(), not
XLogReadRecord().

XLogFindNextRecord() is the only way to align the user-provided start
LSN on a valid record, but that calls XLogReadRecord(), which may wait
indefinitely. If there were a different entry point that just did the
alignment and skipped past continuation records, we could prevent it
from trying to read the next record if we are already at the flush
pointer. But without some tweak to that API, nowait is still needed.

Committed your v3 patch with minor modifications.

Regards,
Jeff Davis






Re: bogus: logical replication rows/cols combinations

2022-04-30 Thread Alvaro Herrera
On 2022-Apr-30, Amit Kapila wrote:

> I agree with throwing errors for obvious/known bogus cases but do we
> want to throw errors or restrict the combining of column lists when
> row filters are present in all cases? See some examples [1 ] where it
> may be valid to combine them.

I agree we should try to combine things when it is sensible to do so.
Another case that may make sense if there are two or more publications
with identical column lists but different row filters -- in such cases,
as Tomas suggested, we should combine the filters with OR.

Also, if only INSERTs are published and not UPDATE/DELETEs, then it
might be sensible to combine everything, regardless of whether or not
the column lists and row filters match.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)




Re: Building Postgres with lz4 on Visual Studio

2022-04-30 Thread Andres Freund
Hi,

On 2022-04-29 08:50:56 -0400, Andrew Dunstan wrote:
> I agree that we should use perl's -e to test that the files actually
> exists. But I don't think we should try to adjust to everything the zstd
> and lz4 people put in their release files. They are just horribly
> inconsistent.

Right now it's the source of packages we document for windows... It doesn't
seem that crazy to accept a few different paths with a glob or such?


> What I did was to install the packages using vcpkg[1] which is a
> standard framework created by Microsoft for installing package
> libraries. It does install the .lib files in a sane place
> (installdir/lib), but it doesn't use the lib suffix. Also it names the
> lib file for zlib differently.

That doesn't seem much better :(

Greetings,

Andres Freund