Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada
On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila  wrote:
> > On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila  wrote:
> > >
> > > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
> > > >
> > > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> > > >
> > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > personally still prefer the approach implemented in v10 [1]
> > > > especially due to the latest finding by Wang-San that we can't
> > > > update the lag-tracker apart from when it is invoked at the transaction 
> > > > end.
> > > > However, I am fine if we like this approach more.
> > > >
> > > > It seems v15 is simpler and less error prone than v10. v10 has a mix
> > > > of
> > > > OutputPluginUpdateProgress() and the new function update_progress().
> > > > The v10 also calls update_progress() for every change action in
> > > > pgoutput_change(). It is not a good approach for maintainability --
> > > > new changes like sequences need extra calls.
> > > >
> > >
> > > Okay, let's use the v15 approach as Sawada-San also seems to have a
> > > preference for that.
> > >
> > > > However, as you mentioned there should handle the track lag case.
> > > >
> > > > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > > > backpatched. Are you planning to backpatch it? If so, the boolean
> > > > variable (last_write or end_xacts depending of which version you are
> > > > considering) could be added to LogicalDecodingContext.
> > > >
> > >
> > > If we add it to LogicalDecodingContext then I think we have to always
> > > reset the variable after its use which will make it look ugly and
> > > error-prone. I was not thinking to backpatch it because of the API
> > > change but I guess if we want to backpatch then we can add it to
> > > LogicalDecodingContext for back-branches. I am not sure if that will
> > > look committable but surely we can try.
> > >
> >
> > Even, if we want to add the variable in the struct in back-branches, we 
> > need to
> > ensure not to change the size of the struct as it is exposed, see email [1] 
> > for a
> > similar mistake we made in another case.
> >
> > [1] - https://www.postgresql.org/message-
> > id/2358496.1649168259%40sss.pgh.pa.us
> Thanks for your comments.
>
> I did some checks about adding the new variable in LogicalDecodingContext.
> I found that because of padding, if we add the new variable at the end of
> structure, it dose not make the structure size change. I verified this in
> REL_10~REL_14.
>
> So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. 
> Attach
> this patch. To prevent patch confusion, the patch for HEAD is also attached.
> The patch for REL_14:
> REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
> The patch for HEAD:
> v17-0001-Fix-the-logical-replication-timeout-during-large.patch
>
> The following is the details of checks.
> On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable
> in the structure LogicalDecodingContext, I found that there are three parts
> padding behind the following member variables:
> - 7 bytes after fast_forward
> - 4 bytes after prepared_write
> - 4 bytes after write_xid
>
> If we add the new variable at the end of structure (bool takes one byte), it
> means we will only consume one byte of padding after member write_xid. And
> then, at the end of the struct, 3 padding are still required. For easy
> understanding, please refer to the following simple calculation.
> (In REL14, the size of structure LogicalDecodingContext is 304 bytes.)
> Before adding new variable (In REL14):
> 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4  =  ‭289 (if padding is not 
> considered)
>  +7  +4  +4  =  +15 (the padding)
> So, the size of structure LogicalDecodingContext is 289+15=304.
> After adding new variable (In REL14 with patch):
> 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1  =  ‭290‬ (if padding is not 
> considered)
>  +7  +4+3  =  +14 (the padding)
> So, the size of structure LogicalDecodingContext is 290+14=304.
>
> BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, 
> 200,
> 200,200 respectively. And I found that at the end of the structure
> LogicalDecodingContext, there are always the following members:
> ```
> XLogRecPtr  write_location;   --> 8
> TransactionId write_xid;  --> 4
>   --> There are 4 padding after write_xid.
> ```

I'm concerned that this 4-byte padding at the end of the struct could
depend on platforms (there might be no padding in 32-bit platforms?).
It seems to me that it's better to put it after fast_forward where the
new field should fall within the padding space.

BTW the changes in
REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
adding end_xact to LogicalDecodingContext, seems good t

Re: Bad estimate with partial index

2022-04-20 Thread Tomas Vondra
On 4/19/22 23:08, Tom Lane wrote:
> I wrote:
>> it looks like the problem is that the extended stats haven't been used
>> while forming the estimate of the number of index entries retrieved, so
>> we overestimate the cost of using this index.
>> That seems like a bug.  Tomas?
> 
> I dug into this enough to locate the source of the problem.
> btcostestimate includes the partial index clauses in what it
> sends to clauselist_selectivity, but per the comments for
> add_predicate_to_index_quals:
> 
>  * Note that indexQuals contains RestrictInfo nodes while the indpred
>  * does not, so the output list will be mixed.  This is OK for both
>  * predicate_implied_by() and clauselist_selectivity(), but might be
>  * problematic if the result were passed to other things.
> 
> That comment was true when it was written, but it's been falsified
> by the extended-stats patches, which have added a whole lot of logic
> in and under clauselist_selectivity that ignores clauses that are not
> RestrictInfos.
> 
> While we could perhaps fix this by having add_predicate_to_index_quals
> add RestrictInfos, I'm inclined to feel that the extended-stats code
> is in the wrong.  The contract for clauselist_selectivity has always
> been that it could optimize if given RestrictInfos rather than bare
> clauses, not that it would fail to work entirely without them.
> There are probably more places besides add_predicate_to_index_quals
> that are relying on that.
> 

Yes, that seems like a fair assessment. I'll look into fixing this, not
sure how invasive it will get, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Odd off-by-one dirty buffers and checkpoint buffers written

2022-04-20 Thread Kyotaro Horiguchi
At Tue, 19 Apr 2022 17:51:24 -0700, "David G. Johnston" 
 wrote in 
> On Tue, Apr 19, 2022 at 4:36 PM Nathan Bossart 
> wrote:
> 
> > On Tue, Apr 19, 2022 at 04:21:21PM -0700, David G. Johnston wrote:
> > > I've done this four times in a row and while the number of dirty buffers
> > > shown each time vary (see below) I see that "wrote N buffers" is always
> > > exactly one more than the total count of dirty buffers.  I'm just curious
> > > if anyone has a quick answer for this unusual correspondence.
> >
> > I see that SlruInternalWritePage() increments ckpt_bufs_written, so my
> > first guess would be that it's due to something like CheckPointCLOG().
> >
> >
> I peeked at pg_stat_bgwriter and see an increase in buffers_checkpoint
> matching the dirty buffers number.
> 
> I also looked at pg_stat_slru to try and find the corresponding change
> caused by:
> 
> slru.c:766 (SlruPhysicalWritePage)
> pgstat_count_slru_page_written(shared->slru_stats_idx);
> 
> I do see (Xact) blks_hit change during this process (after the
> update/commit, not the checkpoint, though) but it increases by 2 when dirty
> buffers is 4.  I was expecting 4, thinking that blocks and buffers and
> pages are basically the same things (which [1] seems to affirm).
> 
> https://www.postgresql.org/message-id/13563.1044552279%40sss.pgh.pa.us

If I understand you point correctly..

Xact SLRU is so-called CLOG, on which transaction statuses
(running/committed/aborted) are recorded.  Its pages are separate
objects from table pages, which are out-of-sight of pg_bufferchace.
However, the same relationship between pages, blocks and buffers
applies to the both cases in parallel.

The reason for the 2 hits of Xact SLRU is that once for visibility
(MVCC) check and another for commit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add --{no-,}bypassrls flags to createuser

2022-04-20 Thread Kyotaro Horiguchi
At Tue, 19 Apr 2022 12:13:51 -0400, Robert Haas  wrote 
in 
> On Mon, Apr 18, 2022 at 9:50 PM Kyotaro Horiguchi
>  wrote:
> > Hmm.. So, "-r/--role" and "-m/--member(ship)" is the (least worse) way
> > to go?  Or we can give up adding -m for the reason of being hard to
> > name it..
> 
> Hmm, yeah, I hadn't quite realized what the problem was when I wrote
> that. I honestly don't know what to do about that. Renaming the
> existing option is not great, but having the syntax diverge between
> SQL and CLI is not great either. Giving up is also not great. Not sure
> what is best.

Exactly..  So I'm stuckX(

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Perform streaming logical transactions by background workers and parallel apply

2022-04-20 Thread houzj.f...@fujitsu.com
On Tuesday, April 19, 2022 2:58 PM Amit Kapila  wrote:
> 
> On Thu, Apr 14, 2022 at 9:12 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Friday, April 8, 2022 5:14 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Attach a new version patch which improved the error handling and handled
> the case
> > when there is no more worker available (will spill the data to the temp 
> > file in
> this case).
> >
> > Currently, it still doesn't support skip the streamed transaction in 
> > bgworker,
> because
> > in this approach, we don't know the last lsn for the streamed transaction
> being applied,
> > so cannot get the lsn to SKIP. I will think more about it and keep testing 
> > the
> patch.
> >
> 
> I think we can avoid performing the streaming transaction by bgworker
> if skip_lsn is set. This needs some more thought but anyway I see
> another problem in this patch. I think we won't be able to make the
> decision whether to apply the change for a relation that is not in the
> 'READY' state (see should_apply_changes_for_rel) as we won't know
> 'remote_final_lsn' by that time for streaming transactions. I think
> what we can do here is that before assigning the transaction to
> bgworker, we can check if any of the rels is not in the 'READY' state,
> we can make the transaction spill the changes as we are doing now.
> Even if we do such a check, it is still possible that some rel on
> which this transaction is performing operation can appear to be in
> 'non-ready' state after starting bgworker and for such a case I think
> we need to give error and restart the transaction as we have no way to
> know whether we need to perform an operation on the 'rel'. This is
> possible if the user performs REFRESH PUBLICATION in parallel to this
> transaction as that can add a new rel to the pg_subscription_rel.

Changed as suggested.

Attach the new version patch which cleanup some code and fix above problem. For
now, it won't apply streaming transaction in bgworker if skiplsn is set or any
table is not in 'READY' state.

Besides, extent the subscription streaming option to ('on/off/apply(apply in
bgworker)/spool(spool to file)') so that user can control whether to apply The
transaction in a bgworker.

Best regards,
Hou zj


v3-0001-Perform-streaming-logical-transactions-by-background.patch
Description:  v3-0001-Perform-streaming-logical-transactions-by-background.patch


Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila
On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
>  wrote:
> > ```
>
> I'm concerned that this 4-byte padding at the end of the struct could
> depend on platforms (there might be no padding in 32-bit platforms?).
>

Good point, but ...

> It seems to me that it's better to put it after fast_forward where the
> new field should fall within the padding space.
>

Can we add the variable in between the existing variables in the
structure in the back branches? Normally, we add at the end to avoid
any breakage of existing apps. See commit 56e366f675 and discussion at
[1]. That is related to enum but I think we follow the same for
structures.

[1] - 
https://www.postgresql.org/message-id/7dab0929-a966-0c0a-4726-878fced2fe00%40enterprisedb.com
-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add native windows on arm64 support

2022-04-20 Thread Niyas Sait
> Have you tested with the amount of coverage provided by vcregress.pl?

I built and ran the relevant tests with the help of run_build.pl.

I think following tests are executed - check, contribcheck, ecpgcheck,
installcheck, isolationcheck, modulescheck, and upgradecheck.

> Another thing I was wondering about is if it would be possible to have
> this option in Travis, but that does not seem to be the case:
> https://docs.travis-ci.com/user/reference/windows/#windows-version

Yes, I think Travis doesn't yet support Windows/Arm64 platform.

> +   if ($solution->{platform} eq 'ARM64') {
> +   push(@pgportfiles, 'pg_crc32c_armv8_choose.c');
> +   push(@pgportfiles, 'pg_crc32c_armv8.c');
> +   } else {
> +   push(@pgportfiles, 'pg_crc32c_sse42_choose.c');
> +   push(@pgportfiles, 'pg_crc32c_sse42.c');
> +   }
>
> +++ b/src/port/pg_crc32c_armv8.c
> +#ifndef _MSC_VER
>  #include 
> +#endif
> [ ... ]
> +#ifdef _M_ARM64
> +   /*
> +* arm64 way of hinting processor for spin loops optimisations
> +* ref:
https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq
> +*/
> +   __isb(_ARM64_BARRIER_SY);
> +#else
> I think that such extra optimizations had better be in a separate
> patch, and we should focus on getting the build done first.

> This would mean that a basic patch could be done with just the changes
> for gendef.pl, and the first part of the changes inMSBuildProject.pm.
> Would that not be enough?

I cannot build without any of the above changes. Nothing specific for
optimization
is added to this patch.

> +   # arm64 linker only supports dynamic base address
> +   my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' :
'false';
> This issue is still lying around, and you may have been lucky.  Would
> there be any issues to remove this change to get a basic support in?
> As mentioned upthread, there is a long history of Postgres with ASLR.

MSVC linker doesn't allow non-random base addresses for Arm64 platforms.
It is needed for basic support.

Niyas


Re: Dump/Restore of non-default PKs

2022-04-20 Thread Simon Riggs
On Wed, 20 Apr 2022 at 03:05, David G. Johnston
 wrote:

> https://www.postgresql.org/docs/current/sql-altertable.html
> ADD table_constraint_using_index
> ...This form is not currently supported on partitioned tables.

Good to know, thanks very much for pointing it out.

That needs to be fixed before we can progress further on this thread.
Will look into it.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila
On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> >  wrote:
> > > ```
> >
> > I'm concerned that this 4-byte padding at the end of the struct could
> > depend on platforms (there might be no padding in 32-bit platforms?).
> >
>
> Good point, but ...
>
> > It seems to me that it's better to put it after fast_forward where the
> > new field should fall within the padding space.
> >
>
> Can we add the variable in between the existing variables in the
> structure in the back branches?
>

I think it should be fine if it falls in the padding space. We have
done similar changes recently in back-branches [1]. I think it would
be then better to have it in the same place in HEAD as well?

[1] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346876aad4941797c2255a21bdac74739

-- 
With Regards,
Amit Kapila.




Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508

2022-04-20 Thread Erik Rijkers

Op 20-04-2022 om 06:54 schreef Kyotaro Horiguchi:

At Tue, 19 Apr 2022 10:55:26 -0700, Andres Freund  wrote in

Hi,

On 2022-04-19 10:36:24 -0700, Andres Freund wrote:

On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote:

The 12th run of statbug.sh crashed and gave a corefile.


I ran through quite a few iterations by now, without reproducing :(

I guess there's some timing issue and you're hitting on your system
due to the slower disks.


Ah. I found the issue. The new pgstat_report_stat(true) call in
LogicalRepApplyLoop()'s "timeout" section doesn't check if we're in a
transaction. And the transactional stats code doesn't handle that (never
has).

I think all that's needed is a if (IsTransactionState()) around that
pgstat_report_stat().


if (!IsTransactinoState()) ?


It might be possible to put an assertion into pgstat_report_stat(), but
I need to look at the process exit code to see if it is.


Inserting a sleep in pgoutput_commit_txn reproduced this. Crashes with
the same stack trace with the similar variable state.

diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index b197bfd565..def4d751d3 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -568,6 +568,7 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
return;
}
  
+	sleep(2);

OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);

The following  actuall works for this.

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 4171371296..f4e5359513 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2882,10 +2882,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
send_feedback(last_received, requestReply, 
requestReply);
  
  			/*

-* Force reporting to ensure long idle periods don't 
lead to
-* arbitrarily delayed stats.
+* Force reporting to ensure long out-of-transaction 
idle periods
+* don't lead to arbitrarily delayed stats.
 */
-   pgstat_report_stat(true);
+   if (!IsTransactionState())
+   pgstat_report_stat(true);
}
}
  


Yes, that seems to fix it: I applied that latter patch, and ran my 
program 250x without errors. Then I removed it again an it gave the 
error within 15x.


thanks!

Erik



regards.






Re: minor MERGE cleanups

2022-04-20 Thread Alvaro Herrera
On 2022-Apr-20, Michael Paquier wrote:

> On Tue, Apr 19, 2022 at 03:45:22PM +0200, Alvaro Herrera wrote:
> > I expect these fixups in new code should be uncontroversial.
> 
> The whole set looks rather sane to me.

Thank you, I have pushed them.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: typos

2022-04-20 Thread Alvaro Herrera
On 2022-Apr-20, Amit Kapila wrote:

> Your proposed changes look good to me but I think all these places
> need to mention 'column list' as well because the behavior is the same
> for it.

Hmm, you're right.  Added that, and changed the wording somewhat because
some things read awkwardly.  Here's the result in patch form.

Column lists seems not mentioned in logical-replication.sgml, either.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 23d883c158..c58478e8f8 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -79,7 +79,8 @@ CREATE PUBLICATION name
  
 
  
-  If the optional WHERE clause is specified, rows for
+  If the optional WHERE clause is specified, it defines a
+  row filter expression. Rows for
   which the expression
   evaluates to false or null will not be published. Note that parentheses
   are required around the expression. It has no effect on
@@ -192,6 +193,11 @@ CREATE PUBLICATION name
   consisting of a different set of partitions.
  
 
+ 
+  This parameter also affects how row filters and column lists are
+  chosen for partitions; see below for details.
+ 
+
  
   If this is enabled, TRUNCATE operations performed
   directly on partitions are not replicated.
@@ -241,21 +247,28 @@ CREATE PUBLICATION name
   
 
   
-   A WHERE (i.e. row filter) expression must contain only
+   A row filter expression (i.e., the WHERE clause) must contain only
columns that are covered by the REPLICA IDENTITY, in
order for UPDATE and DELETE operations
to be published. For publication of INSERT operations,
any column may be used in the WHERE expression. The
-   WHERE clause allows simple expressions that don't have
+   row filter allows simple expressions that don't have
user-defined functions, user-defined operators, user-defined types,
user-defined collations, non-immutable built-in functions, or references to
system columns.
-   If your publication contains a partitioned table, the publication parameter
-   publish_via_partition_root determines if it uses the
-   partition's row filter (if the parameter is false, the default) or the root
-   partitioned table's row filter.
+  
+  
+  
+   For published partitioned tables, the row filter for each
+   partition is taken from the published partitioned table if the
+   publication parameter publish_via_partition_root is true,
+   or from the partition itself if it is false (the default).
See  for details about row
filters.
+   Similarly, for published partitioned tables, the column list for each
+   partition is taken from the published partitioned table if the
+   publication parameter publish_via_partition_root is true,
+   or from the partition itself if it is false.
   
 
   


Re: generalized conveyor belt storage

2022-04-20 Thread Alvaro Herrera
What's with the free text in cbstorage.h?  I would guess that this
wouldn't even compile, and nobody has noticed because the file is not
included by anything yet ...

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)




RE: Perform streaming logical transactions by background workers and parallel apply

2022-04-20 Thread houzj.f...@fujitsu.com
On Wednesday, April 20, 2022 4:57 PM houzj.f...@fujitsu.com wrote:
> 
> On Tuesday, April 19, 2022 2:58 PM Amit Kapila 
> wrote:
> >
> > On Thu, Apr 14, 2022 at 9:12 AM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Friday, April 8, 2022 5:14 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach a new version patch which improved the error handling and
> > > handled
> > the case
> > > when there is no more worker available (will spill the data to the
> > > temp file in
> > this case).
> > >
> > > Currently, it still doesn't support skip the streamed transaction in
> > > bgworker,
> > because
> > > in this approach, we don't know the last lsn for the streamed
> > > transaction
> > being applied,
> > > so cannot get the lsn to SKIP. I will think more about it and keep
> > > testing the
> > patch.
> > >
> >
> > I think we can avoid performing the streaming transaction by bgworker
> > if skip_lsn is set. This needs some more thought but anyway I see
> > another problem in this patch. I think we won't be able to make the
> > decision whether to apply the change for a relation that is not in the
> > 'READY' state (see should_apply_changes_for_rel) as we won't know
> > 'remote_final_lsn' by that time for streaming transactions. I think
> > what we can do here is that before assigning the transaction to
> > bgworker, we can check if any of the rels is not in the 'READY' state,
> > we can make the transaction spill the changes as we are doing now.
> > Even if we do such a check, it is still possible that some rel on
> > which this transaction is performing operation can appear to be in
> > 'non-ready' state after starting bgworker and for such a case I think
> > we need to give error and restart the transaction as we have no way to
> > know whether we need to perform an operation on the 'rel'. This is
> > possible if the user performs REFRESH PUBLICATION in parallel to this
> > transaction as that can add a new rel to the pg_subscription_rel.
> 
> Changed as suggested.
> 
> Attach the new version patch which cleanup some code and fix above problem.
> For now, it won't apply streaming transaction in bgworker if skiplsn is set 
> or any
> table is not in 'READY' state.
> 
> Besides, extent the subscription streaming option to ('on/off/apply(apply in
> bgworker)/spool(spool to file)') so that user can control whether to apply The
> transaction in a bgworker.

Sorry, there was a miss in the pg_dump testcase which cause failure in CFbot.
Attach a new version patch which fix that.

Best regards,
Hou zj


v4-0001-Perform-streaming-logical-transactions-by-background.patch
Description:  v4-0001-Perform-streaming-logical-transactions-by-background.patch


Re: generalized conveyor belt storage

2022-04-20 Thread Thom Brown
On Wed, 20 Apr 2022 at 13:02, Alvaro Herrera  wrote:
>
> What's with the free text in cbstorage.h?  I would guess that this
> wouldn't even compile, and nobody has noticed because the file is not
> included by anything yet ...

I'm not able to compile:

cbfsmpage.c: In function ‘cb_fsmpage_initialize’:
cbfsmpage.c:34:11: warning: unused variable ‘fsm_block_spacing’
[-Wunused-variable]
  unsigned fsm_block_spacing = cb_fsm_block_spacing(pages_per_segment);
   ^
cbfsmpage.c:33:14: warning: unused variable ‘first_fsm_block’
[-Wunused-variable]
  BlockNumber first_fsm_block = cb_first_fsm_block(pages_per_segment);
...
cbxlog.c: In function ‘cb_xlog_allocate_payload_segment’:
cbxlog.c:70:24: error: void value not ignored as it ought to be
  bool  have_fsm_page = XLogRecGetBlockTag(record, 1, NULL, NULL, NULL);
^
cbxlog.c: In function ‘cb_xlog_allocate_index_segment’:
cbxlog.c:123:17: error: void value not ignored as it ought to be
  have_prev_page = XLogRecGetBlockTag(record, 2, NULL, NULL, NULL);
 ^
cbxlog.c:124:16: error: void value not ignored as it ought to be
  have_fsm_page = XLogRecGetBlockTag(record, 3, NULL, NULL, NULL);
^
cbxlog.c: In function ‘cb_xlog_recycle_payload_segment’:
cbxlog.c:311:16: error: void value not ignored as it ought to be
  have_metapage = XLogRecGetBlockTag(record, 0, NULL, NULL, NULL);
^
cbxlog.c:312:18: error: void value not ignored as it ought to be
  have_index_page = XLogRecGetBlockTag(record, 1, NULL, NULL, NULL);
  ^
cbxlog.c:313:16: error: void value not ignored as it ought to be
  have_fsm_page = XLogRecGetBlockTag(record, 2, NULL, NULL, NULL);
^
make[4]: *** [cbxlog.o] Error 1
make[4]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access/conveyor'
make[3]: *** [conveyor-recursive] Error 2
make[3]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access'
make[2]: *** [access-recursive] Error 2


-- 
Thom




RE: Documentation issue with pg_stat_recovery_prefetch

2022-04-20 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi, 

Thank you for developing the new feature.
The pg_stat_recovery_prefetch view documentation doesn't seem to have a 
description of the stats_reset column. The attached small patch adds a 
description of the stats_reset column.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Thomas Munro  
Sent: Tuesday, April 12, 2022 6:23 PM
To: sirisha chamarthi 
Cc: PostgreSQL Hackers 
Subject: Re: Documentation issue with pg_stat_recovery_prefetch

On Tue, Apr 12, 2022 at 8:11 AM sirisha chamarthi  
wrote:
> I was going through pg_stat_recovery_prefetch documentation and saw an issue 
> with formatting. Attached a small patch to fix the issue. This is the first 
> time I am sending an email to hackers. Please educate me if I miss something.

Thanks Sirisha!

Ouch, that's embarrassing.  My best guess is that I might have screwed that up 
a long time ago while rebasing an early development version over commit 
92f94686, which changed the link style and moved paragraphs around, and then 
never noticed that it was wrong.
Researching that made me notice another problem: the table was using the 3 
column layout from a couple of years ago, because I had also missed the style 
change in commit a0427506.  Oops.  Fixed.




pg_stat_recovery_prefetch_doc_v1.diff
Description: pg_stat_recovery_prefetch_doc_v1.diff


Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada
On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > > ```
> > >
> > > I'm concerned that this 4-byte padding at the end of the struct could
> > > depend on platforms (there might be no padding in 32-bit platforms?).
> > >
> >
> > Good point, but ...
> >
> > > It seems to me that it's better to put it after fast_forward where the
> > > new field should fall within the padding space.
> > >
> >
> > Can we add the variable in between the existing variables in the
> > structure in the back branches?
> >
>
> I think it should be fine if it falls in the padding space. We have
> done similar changes recently in back-branches [1].

Yes.

> I think it would
> be then better to have it in the same place in HEAD as well?

As far as I can see in the v17 patch, which is for HEAD, we don't add
a variable to LogicalDecodingContext, but did you refer to another
patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [RFC] building postgres with meson -v8

2022-04-20 Thread Peter Eisentraut

On 13.04.22 12:26, Peter Eisentraut wrote:
Some feedback and patches for your branch at 
3274198960c139328fef3c725cee1468bbfff469:


Here is another patch.  It adds support for building ecpg.From 35a23442727cdf82558c7e5eab85bc29df86b5d5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 20 Apr 2022 10:54:29 +0200
Subject: [PATCH 6/6] meson: Add ecpg

---
 meson.build|  5 ++
 src/interfaces/ecpg/compatlib/meson.build  |  8 
 src/interfaces/ecpg/ecpglib/meson.build| 17 +++
 src/interfaces/ecpg/include/meson.build| 46 ++
 src/interfaces/ecpg/meson.build|  5 ++
 src/interfaces/ecpg/pgtypeslib/meson.build | 12 +
 src/interfaces/ecpg/preproc/Makefile   |  2 +-
 src/interfaces/ecpg/preproc/meson.build| 55 ++
 src/interfaces/ecpg/preproc/parse.pl   |  6 ++-
 src/interfaces/meson.build |  1 +
 src/meson.build|  2 +
 11 files changed, 157 insertions(+), 2 deletions(-)
 create mode 100644 src/interfaces/ecpg/compatlib/meson.build
 create mode 100644 src/interfaces/ecpg/ecpglib/meson.build
 create mode 100644 src/interfaces/ecpg/include/meson.build
 create mode 100644 src/interfaces/ecpg/meson.build
 create mode 100644 src/interfaces/ecpg/pgtypeslib/meson.build
 create mode 100644 src/interfaces/ecpg/preproc/meson.build
 create mode 100644 src/interfaces/meson.build

diff --git a/meson.build b/meson.build
index e33ed11b08..bd6f1759e8 100644
--- a/meson.build
+++ b/meson.build
@@ -1637,6 +1637,11 @@ else
   cdata.set('STRERROR_R_INT', false)
 endif
 
+# XXX replace with real test
+if host_machine.system() == 'darwin'
+  cdata.set('LOCALE_T_IN_XLOCALE', true)
+endif
+
 # MSVC doesn't cope well with defining restrict to __restrict, the
 # spelling it understands, because it conflicts with
 # __declspec(restrict). Therefore we define pg_restrict to the
diff --git a/src/interfaces/ecpg/compatlib/meson.build 
b/src/interfaces/ecpg/compatlib/meson.build
new file mode 100644
index 00..68ae270038
--- /dev/null
+++ b/src/interfaces/ecpg/compatlib/meson.build
@@ -0,0 +1,8 @@
+library('ecpg_compat',
+'informix.c',
+include_directories : ['.', '../include', postgres_inc, '../../libpq'],
+dependencies: [frontend_shlib_code, thread_dep],
+link_with: [ecpglib, ecpg_pgtypes],
+soversion: 3,
+install: true,
+   )
diff --git a/src/interfaces/ecpg/ecpglib/meson.build 
b/src/interfaces/ecpg/ecpglib/meson.build
new file mode 100644
index 00..c32fc13871
--- /dev/null
+++ b/src/interfaces/ecpg/ecpglib/meson.build
@@ -0,0 +1,17 @@
+ecpglib = library('ecpg',
+  'connect.c',
+  'data.c',
+  'descriptor.c',
+  'error.c',
+  'execute.c',
+  'memory.c',
+  'misc.c',
+  'prepare.c',
+  'sqlda.c',
+  'typename.c',
+  include_directories : ['.', '../include', postgres_inc],
+  dependencies: [frontend_shlib_code, libpq, thread_dep],
+  link_with: [ecpg_pgtypes],
+  soversion: 6,
+  install: true,
+ )
diff --git a/src/interfaces/ecpg/include/meson.build 
b/src/interfaces/ecpg/include/meson.build
new file mode 100644
index 00..840a23998d
--- /dev/null
+++ b/src/interfaces/ecpg/include/meson.build
@@ -0,0 +1,46 @@
+ecpg_conf_keys = [
+  'ENABLE_THREAD_SAFETY',
+  'HAVE_INT64',
+  'HAVE_LONG_INT_64',
+  'HAVE_LONG_LONG_INT_64',
+  'PG_USE_STDBOOL',
+]
+
+ecpg_conf_data = configuration_data()
+
+foreach key : ecpg_conf_keys
+  if cdata.has(key)
+ecpg_conf_data.set(key, cdata.get(key))
+  endif
+endforeach
+
+configure_file(
+  output: 'ecpg_config.h',
+  configuration: ecpg_conf_data,
+  install_dir: get_option('includedir'),
+)
+
+install_headers(
+  'ecpg_informix.h',
+  'ecpgerrno.h',
+  'ecpglib.h',
+  'ecpgtype.h',
+  'pgtypes.h',
+  'pgtypes_date.h',
+  'pgtypes_error.h',
+  'pgtypes_interval.h',
+  'pgtypes_numeric.h',
+  'pgtypes_timestamp.h',
+  'sql3types.h',
+  'sqlca.h',
+  'sqlda.h',
+  'sqlda-compat.h',
+  'sqlda-native.h',
+)
+
+install_headers(
+  'datetime.h',
+  'decimal.h',
+  'sqltypes.h',
+  subdir: 'informix/esql',
+)
diff --git a/src/interfaces/ecpg/meson.build b/src/interfaces/ecpg/meson.build
new file mode 100644
index 00..ffbe84c0e8
--- /dev/null
+++ b/src/interfaces/ecpg/meson.build
@@ -0,0 +1,5 @@
+subdir('include')
+subdir('pgtypeslib')
+subdir('ecpglib')
+subdir('compatlib')
+subdir('preproc')
diff --git a/src/interfaces/ecpg/pgtypeslib/meson.build 
b/src/interfaces/ecpg/pgtypeslib/meson.build
new file mode 100644
index 00..9b1d54c019
--- /dev/null
+++ b/src/interfaces/ecpg/pgtypeslib/meson.build
@@ -0,0 +1,12 @@
+ecpg_pgtypes = library('pgtypes',
+   'common.c',
+   'datetime.c'

Re: using an end-of-recovery record in all cases

2022-04-20 Thread Robert Haas
On Tue, Apr 19, 2022 at 4:38 PM Nathan Bossart  wrote:
> Shouldn't latestCompletedXid be set to MaxTransactionId in this case?  Or
> is this related to the logic in FullTransactionIdRetreat() that avoids
> skipping over the "actual" special transaction IDs?

The problem here is this code:

/* also initialize latestCompletedXid, to nextXid - 1 */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid);
LWLockRelease(ProcArrayLock);

If nextXid is 3, then latestCompletedXid gets 2. But in
GetRunningTransactionData:

Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));

> Your reasoning seems sound to me.

I was talking with Thomas Munro yesterday and he thinks there is a
problem with relfilenode reuse here. In normal running, when a
relation is dropped, we leave behind a 0-length file until the next
checkpoint; this keeps that relfilenode from being used even if the
OID counter wraps around. If we didn't do that, then imagine that
while running with wal_level=minimal, we drop an existing relation,
create a new relation with the same OID, load some data into it, and
crash, all within the same checkpoint cycle, then we will be able to
replay the drop, but we will not be able to restore the relation
contents afterward because at wal_level=minimal they are not logged.
Apparently, we don't create tombstone files during recovery because we
know that there will be a checkpoint at the end.

With the existing use of the end-of-recovery record, we always know
that wal_level>minimal, because we're only using it on standbys. But
with this use that wouldn't be true any more. So I guess we need to
start creating tombstone files even during recovery, or else do
something like what Dilip coded up in
http://postgr.es/m/CAFiTN-u=r8UTCSzu6_pnihYAtwR1=esq5sregtez2tla92f...@mail.gmail.com
which I think would be a better solution at least in the long term.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Bad estimate with partial index

2022-04-20 Thread Tomas Vondra


On 4/20/22 09:58, Tomas Vondra wrote:
> On 4/19/22 23:08, Tom Lane wrote:
>> I wrote:
>>> it looks like the problem is that the extended stats haven't been used
>>> while forming the estimate of the number of index entries retrieved, so
>>> we overestimate the cost of using this index.
>>> That seems like a bug.  Tomas?
>>
>> I dug into this enough to locate the source of the problem.
>> btcostestimate includes the partial index clauses in what it
>> sends to clauselist_selectivity, but per the comments for
>> add_predicate_to_index_quals:
>>
>>  * Note that indexQuals contains RestrictInfo nodes while the indpred
>>  * does not, so the output list will be mixed.  This is OK for both
>>  * predicate_implied_by() and clauselist_selectivity(), but might be
>>  * problematic if the result were passed to other things.
>>
>> That comment was true when it was written, but it's been falsified
>> by the extended-stats patches, which have added a whole lot of logic
>> in and under clauselist_selectivity that ignores clauses that are not
>> RestrictInfos.
>>
>> While we could perhaps fix this by having add_predicate_to_index_quals
>> add RestrictInfos, I'm inclined to feel that the extended-stats code
>> is in the wrong.  The contract for clauselist_selectivity has always
>> been that it could optimize if given RestrictInfos rather than bare
>> clauses, not that it would fail to work entirely without them.
>> There are probably more places besides add_predicate_to_index_quals
>> that are relying on that.
>>
> 
> Yes, that seems like a fair assessment. I'll look into fixing this, not
> sure how invasive it will get, though.
> 

So, here's a WIP fix that improves the example shared by Andre, and does
not seem to break anything (or at least not any regression test).

The whole idea is that instead of bailing out for non-RestrictInfo case,
it calculates the necessary information for the clause from scratch.
This means relids and pseudoconstant flag, which are checked to decide
if the clause is compatible with extended stats.

But when inspecting how to calculate pseudoconstant, I realized that
maybe that's not really needed. Per distribute_qual_to_rels() we only
set it to 'true' when bms_is_empty(relids), and we already check that
relids is a singleton, so it can't be empty - which means pseudoconstant
can't be true either.


Andre, are you in position to test this fix with your application? Which
Postgres version are you using, actually?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 06f836308d0..fd151c21815 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -526,6 +526,7 @@ find_single_rel_for_clauses(PlannerInfo *root, List *clauses)
 {
 	int			lastrelid = 0;
 	ListCell   *l;
+	Bitmapset  *clause_relids;
 
 	foreach(l, clauses)
 	{
@@ -560,12 +561,15 @@ find_single_rel_for_clauses(PlannerInfo *root, List *clauses)
 			continue;
 		}
 
-		if (!IsA(rinfo, RestrictInfo))
-			return NULL;
+		if (IsA(rinfo, RestrictInfo))
+			clause_relids = rinfo->clause_relids;
+		else
+			clause_relids = pull_varnos(root, (Node *) rinfo);
+
 
-		if (bms_is_empty(rinfo->clause_relids))
+		if (bms_is_empty(clause_relids))
 			continue;			/* we can ignore variable-free clauses */
-		if (!bms_get_singleton_member(rinfo->clause_relids, &relid))
+		if (!bms_get_singleton_member(clause_relids, &relid))
 			return NULL;		/* multiple relations in this clause */
 		if (lastrelid == 0)
 			lastrelid = relid;	/* first clause referencing a relation */
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ca48395d5c5..90809fbcd8e 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1560,8 +1560,6 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 			 Bitmapset **attnums, List **exprs)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
-	RestrictInfo *rinfo = (RestrictInfo *) clause;
-	int			clause_relid;
 	Oid			userid;
 
 	/*
@@ -1588,21 +1586,41 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		return true;
 	}
 
-	/* Otherwise it must be a RestrictInfo. */
-	if (!IsA(rinfo, RestrictInfo))
-		return false;
+	/* */
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+		int			clause_relid;
 
-	/* Pseudoconstants are not really interesting here. */
-	if (rinfo->pseudoconstant)
-		return false;
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
 
-	/* Clauses referencing other varnos are incompatible. */
-	if (!bms_get_singleton_member(rinfo->clause_relids, &clause_relid) ||
-		clause_relid != relid)
-		return false;
+		/* Clauses referencing other varnos are incompatible. */
+		if (!bms_get_single

Re: Odd off-by-one dirty buffers and checkpoint buffers written

2022-04-20 Thread David G. Johnston
On Wed, Apr 20, 2022 at 1:03 AM Kyotaro Horiguchi 
wrote:

>
> The reason for the 2 hits of Xact SLRU is that once for visibility
> (MVCC) check and another for commit.
>
>
Makes sense.  Thanks.  Now, is the lack of such a detail when looking at
pg_stat_slru (for this and the other 6 named caches) an omission by intent
or just no one has taken the time to write up what the different caches are
holding?  I would think a brief sentence for each followed by a link to the
main section describing the feature would be decent content to add to the
introduction for the view in 28.2.21.

Also, is "other" ever expected to be something besides all zeros?

Thanks!

David J.


Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-20 Thread bu...@sohu.com
> Sure, but that doesn't make the patch correct. The patch proposes
> that, when parallelism in use, a subquery scan will produce fewer rows
> than when parallelism is not in use, and that's 100% false. Compare
> this with the case of a parallel sequential scan. If a table contains
> 1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
> return 1000 rows.  But if we scan it with a Parallel Seq Scan using
> say 4 workers, the number of rows returned in each worker will be
> substantially less than 1000, because 1000 is now the *total* number
> of rows to be returned across *all* processes, and what we need is the
> number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=3)
  Workers Planned: 2
  ->  Subquery Scan  (rows=3) -- *total* rows, should be equal subpath
->  Parallel Seq Scan  (rows=1)

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;



bu...@sohu.com


Re: generalized conveyor belt storage

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 8:02 AM Alvaro Herrera  wrote:
> What's with the free text in cbstorage.h?  I would guess that this
> wouldn't even compile, and nobody has noticed because the file is not
> included by anything yet ...

I think I was using that file for random notes with the idea of
removing it eventually.

I'll clean that up if I get back around to working on this. Right now
it's not clear to me how to get this integrated with vacuum in a
useful way, so finishing this part of it isn't that exciting, at least
not unless somebody else comes up with a cool use for it.

What motivated you to look at this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: generalized conveyor belt storage

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 8:32 AM Thom Brown  wrote:
> On Wed, 20 Apr 2022 at 13:02, Alvaro Herrera  wrote:
> > What's with the free text in cbstorage.h?  I would guess that this
> > wouldn't even compile, and nobody has noticed because the file is not
> > included by anything yet ...
>
> I'm not able to compile:

Since I haven't been working on this for a while, the patch set's not updated.

> cbfsmpage.c: In function ‘cb_fsmpage_initialize’:
> cbfsmpage.c:34:11: warning: unused variable ‘fsm_block_spacing’
> [-Wunused-variable]
>   unsigned fsm_block_spacing = cb_fsm_block_spacing(pages_per_segment);
>^

Hmm, well I guess if that variable is unused we can just delete it.

> cbxlog.c: In function ‘cb_xlog_allocate_payload_segment’:
> cbxlog.c:70:24: error: void value not ignored as it ought to be
>   bool  have_fsm_page = XLogRecGetBlockTag(record, 1, NULL, NULL, NULL);

This is because Tom recently made that function return void instead of
bool. I guess all these will need to be changed to use
XLogRecGetBlockTagExtended and pass an extra NULL for the
*prefetch_buffer argument.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 10:01 AM bu...@sohu.com  wrote:
> for now fuction cost_subqueryscan always using *total* rows even parallel
> path. like this:
>
> Gather (rows=3)
>   Workers Planned: 2
>   ->  Subquery Scan  (rows=3) -- *total* rows, should be equal subpath
> ->  Parallel Seq Scan  (rows=1)

OK, that's bad.

> Maybe the codes:
>
> /* Mark the path with the correct row estimate */
> if (param_info)
> path->path.rows = param_info->ppi_rows;
> else
> path->path.rows = baserel->rows;
>
> should change to:
>
> /* Mark the path with the correct row estimate */
> if (path->path.parallel_workers > 0)
> path->path.rows = path->subpath->rows;
> else if (param_info)
> path->path.rows = param_info->ppi_rows;
> else
> path->path.rows = baserel->rows;

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Bad estimate with partial index

2022-04-20 Thread Tom Lane
Tomas Vondra  writes:
> The whole idea is that instead of bailing out for non-RestrictInfo case,
> it calculates the necessary information for the clause from scratch.
> This means relids and pseudoconstant flag, which are checked to decide
> if the clause is compatible with extended stats.

Right.

> But when inspecting how to calculate pseudoconstant, I realized that
> maybe that's not really needed. Per distribute_qual_to_rels() we only
> set it to 'true' when bms_is_empty(relids), and we already check that
> relids is a singleton, so it can't be empty - which means pseudoconstant
> can't be true either.

Yeah, I would not bother with the pseudoconstant-related tests for a
bare clause.  Patch looks reasonably sane in a quick once-over otherwise,
and the fact that it fixes the presented test case is promising.
(If you set enable_indexscan = off, you can verify that the estimate
for the number of index entries retrieved is now sane.)  I did not look
to see if there were any other RestrictInfo dependencies, though.

regards, tom lane




RE: Bad estimate with partial index

2022-04-20 Thread André Hänsel
Tomas Vondra wrote:
> Andre, are you in position to test this fix with your application? Which 
> Postgres version are you using, actually?

There's a test case in my original email, which obviously was synthetic, but I 
could also test this with my original
application data if I can get a Postgres running with your patch.

I guess I could probably run the official Dockerfile and apply your patch 
somewhere between these lines?
https://github.com/docker-library/postgres/blob/e8ebf74e50128123a8d0220b85e357ef2d73a7ec/14/bullseye/Dockerfile#L138

André





Re: Temporary file access API

2022-04-20 Thread Antonin Houska
Robert Haas  wrote:

> On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska  wrote:
> > Robert Haas  wrote:
> > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> > > > There are't really that many kinds of files to encrypt:
> > > >
> > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > > >
> > > > (And pg_stat/* files should be removed from the list.)
> > >
> > > This kind of gets into some theoretical questions. Like, do we think
> > > that it's an information leak if people can look at how many
> > > transactions are committing and aborting in pg_xact_status? In theory
> > > it could be, but I know it's been argued that that's too much of a
> > > side channel. I'm not sure I believe that, but it's arguable.
> >
> > I was referring to the fact that the statistics are no longer stored in 
> > files:
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd
> 
> Oh, yeah, I agree with that.

I see now that the statistics are yet saved to a file on server shutdown. I've
updated the wiki page.

Attached is a new version of the patch, to evaluate what the API use in the
backend could look like. I haven't touched places where the file is accessed
in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
called.

Another use case might be copying one file to another via a buffer. Something
like

BufFileCopy(int dstfd, int srcfd, int bufsize)

The obvious call site would be in copydir.c:copy_file(), but I think there are
a few more in the server code.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



temp_file_api_v2.tgz
Description: application/gzip


Re: [Proposal] vacuumdb --schema only

2022-04-20 Thread Gilles Darold

Le 18/04/2022 à 23:56, Nathan Bossart a écrit :

On Thu, Apr 14, 2022 at 10:27:46PM +0200, Gilles Darold wrote:

Attached v8 of the patch that tries to address the remarks above, fixes
patch apply failure to master and replace calls to pg_log_error+exit
with pg_fatal.

Thanks for the new patch.


+enum trivalue schema_is_exclude = TRI_DEFAULT;
+
+/*
+ * The kind of object use in the command line filter.
+ *   OBJFILTER_NONE: no filter used
+ *   OBJFILTER_SCHEMA: -n | --schema or -N | --exclude-schema
+ *   OBJFILTER_TABLE: -t | --table
+ */
+enum VacObjectFilter
+{
+   OBJFILTER_NONE,
+   OBJFILTER_TABLE,
+   OBJFILTER_SCHEMA
+};
+
+enum VacObjectFilter objfilter = OBJFILTER_NONE;

I still think we ought to remove schema_is_exclude in favor of adding
OBJFILTER_SCHEMA_EXCLUDE to the enum.  I think that would simplify some of
the error handling and improve readability.  IMO we should add
OBJFILTER_ALL, too.


Fixed.



-   simple_string_list_append(&tables, 
optarg);
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (schema_is_exclude != TRI_DEFAULT)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+   simple_string_list_append(&objects, 
optarg);
+   objfilter = OBJFILTER_TABLE;
tbl_count++;
break;
}
@@ -202,6 +224,32 @@ main(int argc, char *argv[])
  
&vacopts.parallel_workers))
exit(1);
break;
+   case 'n':   /* include schema(s) */
+   {
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (tbl_count)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+
+   if (schema_is_exclude == TRI_YES)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and exclude specific schema(s) at the same time");
+   simple_string_list_append(&objects, 
optarg);
+   objfilter = OBJFILTER_SCHEMA;
+   schema_is_exclude = TRI_NO;
+   break;
+   }
+   case 'N':   /* exclude schema(s) */
+   {
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (tbl_count)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+   if (schema_is_exclude == TRI_NO)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and exclude specific schema(s) at the same time");
+
+   simple_string_list_append(&objects, 
optarg);
+   objfilter = OBJFILTER_SCHEMA;
+   schema_is_exclude = TRI_YES;
+   break;

I was expecting these to check objfilter.


Fixed.



Another possible improvement could be to move the pg_fatal() calls to a
helper function that generates the message based on the current objfilter
setting and the current option.  I'm envisioning something like
check_objfilter(VacObjFilter curr_objfilter, VacObjFilter curr_option).


I agree, done.



+   /*
+* When filtering on schema name, filter by table is not allowed.
+* The schema name can already be set to a fqdn table name.
+*/
+   if (tbl_count && objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
+   pg_fatal("cannot vacuum all tables in schema(s) and specific 
table(s) at the same time");

Isn't this redundant with the error in the option handling?


Fixed.



-   if (tables.head != NULL)
+   if (objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
+   {
+   if (schema_is_exclude == TRI_YES)
+   pg_fatal("cannot exclude from vacuum specific 
schema(s) in all databases");
+   else if (schema_is_exclude == TRI_NO)
+   pg_fatal("cannot vacuum specific schema(s) in all 
databases");
+  

Are OIDs for pg_types constant?

2022-04-20 Thread Tyler Brock
Hi everyone,

I am writing a program that behaves like a Postgres backend and can see
that if I select oid from pg_type that the type old’s could be returned in
the Row Description message for the field’s data type and that seems to
work.

However, I didn’t read anywhere that these are guaranteed to be
constant/stable so I’d like to know if this is the case. For example: is
the old for pg_type bool = 16 on every instance of Postgres?

Also does there exist any documentation decoding what the pg_type fields
all mean?

Please let me know, thanks!

-Tyler


Re: Are OIDs for pg_types constant?

2022-04-20 Thread Tom Lane
Tyler Brock  writes:
> I am writing a program that behaves like a Postgres backend and can see
> that if I select oid from pg_type that the type old’s could be returned in
> the Row Description message for the field’s data type and that seems to
> work.
> However, I didn’t read anywhere that these are guaranteed to be
> constant/stable so I’d like to know if this is the case. For example: is
> the old for pg_type bool = 16 on every instance of Postgres?

Hand-assigned OIDs (those below 1) are stable in released versions.
Those above 10K might vary across installations or PG versions.  You
might find it interesting to read

https://www.postgresql.org/docs/current/system-catalog-initial-data.html#SYSTEM-CATALOG-OID-ASSIGNMENT

> Also does there exist any documentation decoding what the pg_type fields
> all mean?

https://www.postgresql.org/docs/current/catalog-pg-type.html

regards, tom lane




Re: Are OIDs for pg_types constant?

2022-04-20 Thread Tyler Brock
 Thank you Tom, this is exactly what I was looking for.

-Tyler


On Apr 20, 2022 at 11:23:59 AM, Tom Lane  wrote:

> Tyler Brock  writes:
>
> I am writing a program that behaves like a Postgres backend and can see
>
> that if I select oid from pg_type that the type old’s could be returned in
>
> the Row Description message for the field’s data type and that seems to
>
> work.
>
> However, I didn’t read anywhere that these are guaranteed to be
>
> constant/stable so I’d like to know if this is the case. For example: is
>
> the old for pg_type bool = 16 on every instance of Postgres?
>
>
> Hand-assigned OIDs (those below 1) are stable in released versions.
> Those above 10K might vary across installations or PG versions.  You
> might find it interesting to read
>
>
> https://www.postgresql.org/docs/current/system-catalog-initial-data.html#SYSTEM-CATALOG-OID-ASSIGNMENT
>
> Also does there exist any documentation decoding what the pg_type fields
>
> all mean?
>
>
> https://www.postgresql.org/docs/current/catalog-pg-type.html
>
> regards, tom lane
>


Re: pg14 psql broke \d datname.nspname.relname

2022-04-20 Thread Robert Haas
On Tue, Apr 19, 2022 at 10:20 PM Mark Dilger
 wrote:
> Looks like most people voted for (B).  In support of that option, here are 
> patches for master and REL_14_STABLE.  Note that I extended the tests 
> compared to v9, which found a problem that is fixed for v10:

OK, I committed these. I am not totally sure we've got all the
problems sorted here, but I don't think that continuing to not commit
anything is going to be better, so here we go.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix NULL pointer reference in _outPathTarget()

2022-04-20 Thread Peter Eisentraut

On 18.04.22 09:35, Richard Guo wrote:

The array sortgrouprefs[] inside PathTarget might be NULL if we have not
identified sort/group columns in this tlist. In that case we would have
a NULL pointer reference in _outPathTarget() when trying to print
sortgrouprefs[] with WRITE_INDEX_ARRAY as we are using the length of
PathTarget->exprs as its array length.


Do you have a test case that triggers this issue?




Re: Fix NULL pointer reference in _outPathTarget()

2022-04-20 Thread Peter Eisentraut

On 18.04.22 20:53, Tom Lane wrote:

A semantics-preserving conversion would have looked something like

 if (node->sortgrouprefs)
 WRITE_INDEX_ARRAY(sortgrouprefs, list_length(node->exprs));

I suppose that Peter was trying to remove special cases from the
outfuncs.c code, but do we want to put this one back?  Richard's
proposal would not accurately reflect the contents of the data
structure, so I'm not too thrilled with it.


I think we could put the if (node->fldname) inside the WRITE_INDEX_ARRAY 
macro.





Re: generalized conveyor belt storage

2022-04-20 Thread Alvaro Herrera
On 2022-Apr-20, Robert Haas wrote:

> I'll clean that up if I get back around to working on this. Right now
> it's not clear to me how to get this integrated with vacuum in a
> useful way, so finishing this part of it isn't that exciting, at least
> not unless somebody else comes up with a cool use for it.

Oh, okay.

> What motivated you to look at this?

Shrug ... it happened to fall in my mutt index view and I wanted to have
a general idea of where it was going.  I have no immediate use for it,
sorry.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)




Re: Fix NULL pointer reference in _outPathTarget()

2022-04-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 18.04.22 20:53, Tom Lane wrote:
>> A semantics-preserving conversion would have looked something like
>>  if (node->sortgrouprefs)
>>  WRITE_INDEX_ARRAY(sortgrouprefs, list_length(node->exprs));

> I think we could put the if (node->fldname) inside the WRITE_INDEX_ARRAY 
> macro.

Yeah, that's another way to do it.  I think though that the unresolved
question is whether or not we want the field name to appear in the output
when the field is null.  I believe that I intentionally made it not appear
originally, so that that case could readily be distinguished.  You could
argue that that would complicate life greatly for a _readPathTarget()
function, which is true, but I don't foresee that we'll need one.

regards, tom lane




Re: using an end-of-recovery record in all cases

2022-04-20 Thread Nathan Bossart
On Wed, Apr 20, 2022 at 09:26:07AM -0400, Robert Haas wrote:
> I was talking with Thomas Munro yesterday and he thinks there is a
> problem with relfilenode reuse here. In normal running, when a
> relation is dropped, we leave behind a 0-length file until the next
> checkpoint; this keeps that relfilenode from being used even if the
> OID counter wraps around. If we didn't do that, then imagine that
> while running with wal_level=minimal, we drop an existing relation,
> create a new relation with the same OID, load some data into it, and
> crash, all within the same checkpoint cycle, then we will be able to
> replay the drop, but we will not be able to restore the relation
> contents afterward because at wal_level=minimal they are not logged.
> Apparently, we don't create tombstone files during recovery because we
> know that there will be a checkpoint at the end.

In the example you provided, won't the tombstone file already be present
before the crash?  During recovery, the tombstone file will be removed, and
the new relation wouldn't use the same relfilenode anyway.  I'm probably
missing something obvious here.

I do see the problem if we drop an existing relation, crash, reuse the
filenode, and then crash again (all within the same checkpoint cycle).  The
first recovery would remove the tombstone file, and the second recovery
would wipe out the new relation's files.

> With the existing use of the end-of-recovery record, we always know
> that wal_level>minimal, because we're only using it on standbys. But
> with this use that wouldn't be true any more. So I guess we need to
> start creating tombstone files even during recovery, or else do
> something like what Dilip coded up in
> http://postgr.es/m/CAFiTN-u=r8UTCSzu6_pnihYAtwR1=esq5sregtez2tla92f...@mail.gmail.com
> which I think would be a better solution at least in the long term.

IMO this would be good just to reduce the branching a bit.  I suppose
removing the files immediately during recovery might be an optimization in
some cases, but I am skeptical that it really makes that much of a
difference in practice.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: using an end-of-recovery record in all cases

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 5:02 AM Nathan Bossart  wrote:
> I do see the problem if we drop an existing relation, crash, reuse the
> filenode, and then crash again (all within the same checkpoint cycle).  The
> first recovery would remove the tombstone file, and the second recovery
> would wipe out the new relation's files.

Right, the double-crash case is what I was worrying about.  I'm not
sure, but it might even be more likely than usual that you'll reuse
the same relfilenode after the first crash, because the OID allocator
will start from the same value.




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-04-20 Thread Stephen Frost
Greetings,

* Martín Marqués (martin.marq...@gmail.com) wrote:
> The typo is in `exist in in a running cluster`. There's two `in` in a row.

Oops, thanks for catching (and thanks to Michael for committing the
fix!).

> P.D.: I was looking at this just because I was looking at an issue
> where someone bumped their head with this "problem", so great that
> we're in a better place now. Hopefully one day everyone will be
> running PG15 or better :)

Agreed!  Does make me wonder just how often folks run into this issue..
Glad that we were able to get the change in for v15.

Thanks again!

Stephen


signature.asc
Description: PGP signature


Re: [Proposal] vacuumdb --schema only

2022-04-20 Thread Nathan Bossart
Thanks for the new patch!  I think this is on the right track.

On Wed, Apr 20, 2022 at 05:15:02PM +0200, Gilles Darold wrote:
> Le 18/04/2022 à 23:56, Nathan Bossart a écrit :
>> > -  if (!tables_listed)
>> > +  if (!objects_listed || objfilter == OBJFILTER_SCHEMA)
>> Do we need to check for objects_listed here?  IIUC we can just check for
>> objfilter != OBJFILTER_TABLE.
> 
> Yes we need it otherwise test 'vacuumdb with view' fail because we are not
> trying to vacuum the view so the PG doesn't report:
> 
>     WARNING:  cannot vacuum non-tables or special system tables

My point is that the only time we don't want to filter for relevant
relation types is when the user provides a list of tables.  So my
suggestion would be to simplify this to the following:

if (objfilter != OBJFILTER_TABLE)
{
appendPQExpBufferStr(...);
has_where = true;
}

>> Unless I'm missing something, schema_is_exclude appears to only be used for
>> error checking and doesn't impact the generated catalog query.  It looks
>> like the relevant logic disappeared after v4 of the patch.
> 
> Right, removed.

I don't think -N works at the moment.  I tested it out, and vacuumdb was
still processing tables in schemas I excluded.  Can we add a test case for
this, too?

> +/*
> + * Verify that the filters used at command line are compatible
> + */
> +void
> +check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
> +{
> + switch (curr_option)
> + {
> + case OBJFILTER_NONE:
> + break;
> + case OBJFILTER_DATABASE:
> + /* When filtering on database name, vacuum on all 
> database is not allowed. */
> + if (curr_objfilter == OBJFILTER_ALL)
> + pg_fatal("cannot vacuum all databases and a 
> specific one at the same time");
> + break;
> + case OBJFILTER_ALL:
> + /* When vacuuming all database, filter on database name 
> is not allowed. */
> + if (curr_objfilter == OBJFILTER_DATABASE)
> + pg_fatal("cannot vacuum all databases and a 
> specific one at the same time");
> + /* When vacuuming all database, filter on schema name 
> is not allowed. */
> + if (curr_objfilter == OBJFILTER_SCHEMA)
> + pg_fatal("cannot vacuum specific schema(s) in 
> all databases");
> + /* When vacuuming all database, schema exclusion is not 
> allowed. */
> + if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> + pg_fatal("cannot exclude from vacuum specific 
> schema(s) in all databases");
> + /* When vacuuming all database, filter on table name is 
> not allowed. */
> + if (curr_objfilter == OBJFILTER_TABLE)
> + pg_fatal("cannot vacuum specific table(s) in 
> all databases");
> + break;
> + case OBJFILTER_TABLE:
> + /* When filtering on table name, filter by schema is 
> not allowed. */
> + if (curr_objfilter == OBJFILTER_SCHEMA)
> + pg_fatal("cannot vacuum all tables in schema(s) 
> and specific table(s) at the same time");
> + /* When filtering on table name, schema exclusion is 
> not allowed. */
> + if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> + pg_fatal("cannot vacuum specific table(s) and 
> exclude specific schema(s) at the same time");
> + break;
> + case OBJFILTER_SCHEMA:
> + /* When filtering on schema name, filter by table is 
> not allowed. */
> + if (curr_objfilter == OBJFILTER_TABLE)
> + pg_fatal("cannot vacuum all tables in schema(s) 
> and specific table(s) at the same time");
> + /* When filtering on schema name, schema exclusion is 
> not allowed. */
> + if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> + pg_fatal("cannot vacuum all tables in schema(s) 
> and exclude specific schema(s) at the same time");
> + /* filtering on schema name can not be use on all 
> database. */
> + if (curr_objfilter == OBJFILTER_ALL)
> + pg_fatal("cannot vacuum specific schema(s) in 
> all databases");
> + break;
> + case OBJFILTER_SCHEMA_EXCLUDE:
> + /* When filtering on schema exclusion, filter by table 
> is not allowed. */
> + if (curr_objfilter == OBJFILTER_TABLE)
> + pg_fatal("cannot vacuum all tables in schema(s) 
> and specific table(s) at the same time");
> +

Re: [Proposal] vacuumdb --schema only

2022-04-20 Thread Justin Pryzby
On Wed, Apr 20, 2022 at 10:38:46AM -0700, Nathan Bossart wrote:
> > +void
> > +check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter 
> > curr_option)
> > +{
> > +   switch (curr_option)
> > +   {
> > +   case OBJFILTER_NONE:
> > +   break;
> > +   case OBJFILTER_DATABASE:
> > +   /* When filtering on database name, vacuum on all 
> > database is not allowed. */
> > +   if (curr_objfilter == OBJFILTER_ALL)
> > +   pg_fatal("cannot vacuum all databases and a 
> > specific one at the same time");
> > +   break;
> > +   case OBJFILTER_ALL:
> > +   /* When vacuuming all database, filter on database name 
> > is not allowed. */
> > +   if (curr_objfilter == OBJFILTER_DATABASE)
> > +   pg_fatal("cannot vacuum all databases and a 
> > specific one at the same time");
> > +   /* When vacuuming all database, filter on schema name 
> > is not allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA)
> > +   pg_fatal("cannot vacuum specific schema(s) in 
> > all databases");
> > +   /* When vacuuming all database, schema exclusion is not 
> > allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +   pg_fatal("cannot exclude from vacuum specific 
> > schema(s) in all databases");
> > +   /* When vacuuming all database, filter on table name is 
> > not allowed. */
> > +   if (curr_objfilter == OBJFILTER_TABLE)
> > +   pg_fatal("cannot vacuum specific table(s) in 
> > all databases");
> > +   break;
> > +   case OBJFILTER_TABLE:
> > +   /* When filtering on table name, filter by schema is 
> > not allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA)
> > +   pg_fatal("cannot vacuum all tables in schema(s) 
> > and specific table(s) at the same time");
> > +   /* When filtering on table name, schema exclusion is 
> > not allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +   pg_fatal("cannot vacuum specific table(s) and 
> > exclude specific schema(s) at the same time");
> > +   break;
> > +   case OBJFILTER_SCHEMA:
> > +   /* When filtering on schema name, filter by table is 
> > not allowed. */
> > +   if (curr_objfilter == OBJFILTER_TABLE)
> > +   pg_fatal("cannot vacuum all tables in schema(s) 
> > and specific table(s) at the same time");
> > +   /* When filtering on schema name, schema exclusion is 
> > not allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +   pg_fatal("cannot vacuum all tables in schema(s) 
> > and exclude specific schema(s) at the same time");
> > +   /* filtering on schema name can not be use on all 
> > database. */
> > +   if (curr_objfilter == OBJFILTER_ALL)
> > +   pg_fatal("cannot vacuum specific schema(s) in 
> > all databases");
> > +   break;
> > +   case OBJFILTER_SCHEMA_EXCLUDE:
> > +   /* When filtering on schema exclusion, filter by table 
> > is not allowed. */
> > +   if (curr_objfilter == OBJFILTER_TABLE)
> > +   pg_fatal("cannot vacuum all tables in schema(s) 
> > and specific table(s) at the same time");
> > +   /* When filetring on schema exclusion, filter by schema 
> > is not allowed. */
> > +   if (curr_objfilter == OBJFILTER_SCHEMA)
> > +   pg_fatal("cannot vacuum all tables in schema(s) 
> > and exclude specific schema(s) at the same time");
> > +   break;
> > +   }
> > +}
> 
> I don't think this handles all combinations.  For example, the following
> command does not fail:
> 
>   vacuumdb -a -N test postgres
> 
> Furthermore, do you think it'd be possible to dynamically generate the
> message?

Not in the obvious way, because that breaks translatability.

-- 
Justin




Re: [Proposal] vacuumdb --schema only

2022-04-20 Thread Nathan Bossart
On Wed, Apr 20, 2022 at 12:40:52PM -0500, Justin Pryzby wrote:
> On Wed, Apr 20, 2022 at 10:38:46AM -0700, Nathan Bossart wrote:
>> Furthermore, do you think it'd be possible to dynamically generate the
>> message?
> 
> Not in the obvious way, because that breaks translatability.

Ah, right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: effective_io_concurrency and NVMe devices

2022-04-20 Thread Nathan Bossart
On Tue, Apr 19, 2022 at 10:56:05PM -0400, Bruce Momjian wrote:
> NVMe devices have a maximum queue length of 64k:
> 
>   https://blog.westerndigital.com/nvme-queues-explained/
> 
> but our effective_io_concurrency maximum is 1,000:
> 
>   test=> set effective_io_concurrency = 1001;
>   ERROR:  1001 is outside the valid range for parameter 
> "effective_io_concurrency" (0 .. 1000)
> 
> Should we increase its maximum to 64k?  Backpatched?  (SATA has a
> maximum queue length of 256.)

If there are demonstrable improvements with higher values, this seems
reasonable to me.  I would even suggest removing the limit completely so
this doesn't need to be revisited in the future.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Bad estimate with partial index

2022-04-20 Thread Tomas Vondra


On 4/20/22 16:15, Tom Lane wrote:
> Tomas Vondra  writes:
>> The whole idea is that instead of bailing out for non-RestrictInfo case,
>> it calculates the necessary information for the clause from scratch.
>> This means relids and pseudoconstant flag, which are checked to decide
>> if the clause is compatible with extended stats.
> 
> Right.
> 
>> But when inspecting how to calculate pseudoconstant, I realized that
>> maybe that's not really needed. Per distribute_qual_to_rels() we only
>> set it to 'true' when bms_is_empty(relids), and we already check that
>> relids is a singleton, so it can't be empty - which means pseudoconstant
>> can't be true either.
> 
> Yeah, I would not bother with the pseudoconstant-related tests for a
> bare clause.  Patch looks reasonably sane in a quick once-over otherwise,
> and the fact that it fixes the presented test case is promising.

Attached is a slightly more polished patch, adding a couple comments and
removing the unnecessary pseudoconstant checks.

> (If you set enable_indexscan = off, you can verify that the estimate
> for the number of index entries retrieved is now sane.) 

I did that. Sorry for not mentioning that explicitly in my message.

> I did not look to see if there were any other RestrictInfo
> dependencies, though.

I checked the places messing with RestrictInfo in clausesel.c and
src/backend/statististics. Code in clausesel.c seems fine and mcv.c
seems fine to (it merely strips the RestrictInfo).

But dependencies.c might need a fix too, although the issue is somewhat
inverse to this one, because it looks like this:

if (IsA(clause, RestrictInfo))
{
... do some checks ...
}

so if there's no RestrictInfo on top, we just accept the clause. I guess
this should do the same thing with checking relids like the fix, but
I've been unable to construct an example demonstrating the issue (it'd
have to be either pseudoconstant or reference multiple rels, which seems
hard to get in btcostestimate).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 06f836308d..c3131e6c2c 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -520,12 +520,18 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *		Examine each clause in 'clauses' and determine if all clauses
  *		reference only a single relation.  If so return that relation,
  *		otherwise return NULL.
+ *
+ * The clauses may be either RestrictInfos or bare expression clauses, same as
+ * for clauselist_selectivity. The former is preferred since it allows caching
+ * of results, but some callers pass bare expressions.
+
  */
 static RelOptInfo *
 find_single_rel_for_clauses(PlannerInfo *root, List *clauses)
 {
 	int			lastrelid = 0;
 	ListCell   *l;
+	Bitmapset  *clause_relids;
 
 	foreach(l, clauses)
 	{
@@ -560,12 +566,20 @@ find_single_rel_for_clauses(PlannerInfo *root, List *clauses)
 			continue;
 		}
 
-		if (!IsA(rinfo, RestrictInfo))
-			return NULL;
+		/*
+		 * For a RestrictInfo, we can use the cached relids. Otherwise
+		 * inspect the clause to build the bitmap.
+		 */
+		if (IsA(rinfo, RestrictInfo))
+			clause_relids = rinfo->clause_relids;
+		else
+			clause_relids = pull_varnos(root, (Node *) rinfo);
+
 
-		if (bms_is_empty(rinfo->clause_relids))
-			continue;			/* we can ignore variable-free clauses */
-		if (!bms_get_singleton_member(rinfo->clause_relids, &relid))
+		if (bms_is_empty(clause_relids))
+			continue;			/* we can ignore variable-free clauses (this
+ * includes pseudoconstants) */
+		if (!bms_get_singleton_member(clause_relids, &relid))
 			return NULL;		/* multiple relations in this clause */
 		if (lastrelid == 0)
 			lastrelid = relid;	/* first clause referencing a relation */
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ca48395d5c..907141dfee 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1560,9 +1560,9 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 			 Bitmapset **attnums, List **exprs)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
-	RestrictInfo *rinfo = (RestrictInfo *) clause;
-	int			clause_relid;
+	Bitmapset  *relids;
 	Oid			userid;
+	int			clause_relid;
 
 	/*
 	 * Special-case handling for bare BoolExpr AND clauses, because the
@@ -1574,10 +1574,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
 
-		/*
-		 * Check that each sub-clause is compatible.  We expect these to be
-		 * RestrictInfos.
-		 */
+		/* Check that each sub-clause is compatible. */
 		foreach(lc, expr->args)
 		{
 			if (!statext_is_compatible_clause(root, (Node *) lfirst(lc),
@@ -1588,21 +1585,30 @@ statext_is_compatible_clause(

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
>> Agree. In the latest patch, the template0 and postgres OIDs are fixed
>> to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
>> no more listed as unused OIDs.

> Thanks. Committed with a few more cosmetic changes.

I happened to take a closer look at this patch today, and I'm pretty
unhappy with the way that the assignment of those OIDs was managed.
There are two big problems:

1. IsPinnedObject() will now report that template0 and postgres are
pinned.  This seems not to prevent one from dropping them (I guess
dropdb() doesn't consult IsPinnedObject), but it would probably bollix
any pg_shdepend management that should happen for them.

2. The Catalog.pm infrastructure knows nothing about these OIDs.
While the unused_oids script was hack-n-slashed to claim that the
OIDs are used, other scripts won't know about them; for example
duplicate_oids won't report conflicts if someone tries to reuse
those OIDs.

The attached draft patch attempts to improve this situation.
It reserves these OIDs, and creates the associated macros, through
the normal BKI infrastructure by adding entries in pg_database.dat.
We have to delete those rows again during initdb, which is slightly
ugly but surely no more so than initdb's other direct manipulations
of pg_database.

There are a few loose ends:

* I'm a bit inclined to simplify IsPinnedObject by just teaching
it that *no* entries of pg_database are pinned, which would correspond
to the evident lack of enforcement in dropdb().  Can anyone see a
reason why we might pin some database in future?

* I had to set up the additional pg_database entries with nonzero
datfrozenxid to avoid an assertion failure during initdb's first VACUUM.
(That VACUUM will overwrite template1's datfrozenxid before computing
the global minimum frozen XID, but not these others; and it doesn't like
finding that the minimum is zero.)  This feels klugy.  An alternative is
to delete the extra pg_database rows before that VACUUM, which would
mean taking those deletes out of make_template0 and make_postgres and
putting them somewhere seemingly unrelated, so that's a bit crufty too.
Anybody have a preference?

* The new macro names seem ill-chosen.  Template0ObjectId is spelled
randomly differently from the longstanding TemplateDbOid, and surely
PostgresObjectId is about as vague a name as could possibly have
been thought of (please name an object that it couldn't apply to).
I'm a little inclined to rename TemplateDbOid to Template1DbOid and
use Template0DbOid and PostgresDbOid for the others, but I didn't pull
that trigger here.

Comments?

regards, tom lane

diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 520f77971b..d7e5c02f95 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -339,9 +339,11 @@ IsPinnedObject(Oid classId, Oid objectId)
 	 * robustness.
 	 */
 
-	/* template1 is not pinned */
+	/* template1, template0, postgres are not pinned */
 	if (classId == DatabaseRelationId &&
-		objectId == TemplateDbOid)
+		(objectId == TemplateDbOid ||
+		 objectId == Template0ObjectId ||
+		 objectId == PostgresObjectId))
 		return false;
 
 	/* the public namespace is not pinned */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1cb4a5b0d2..04454b3d7c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -59,11 +59,11 @@
 #include "sys/mman.h"
 #endif
 
-#include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
 #include "catalog/pg_collation_d.h"
+#include "catalog/pg_database_d.h"	/* pgrminclude ignore */
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
@@ -1806,14 +1806,24 @@ make_template0(FILE *cmdfd)
 	 * objects in the old cluster, the problem scenario only exists if the OID
 	 * that is in use in the old cluster is also used in the new cluster - and
 	 * the new cluster should be the result of a fresh initdb.)
-	 *
-	 * We use "STRATEGY = file_copy" here because checkpoints during initdb
-	 * are cheap. "STRATEGY = wal_log" would generate more WAL, which would
-	 * be a little bit slower and make the new cluster a little bit bigger.
 	 */
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID = "
-		CppAsString2(Template0ObjectId)
+		/*
+		 * Since pg_database.dat includes a dummy row for template0, we have
+		 * to remove that before creating the database for real.
+		 */
+		"DELETE FROM pg_database WHERE datname = 'template0';\n\n",
+
+		/*
+		 * Clone template1 to make template0.
+		 *
+		 * We use "STRATEGY = file_copy" here because checkpoints during
+		 * initdb are cheap.  "STRATEGY = wal_log" would generate more WAL,
+		 * which would be a little bit slower and make the new

Re: Bad estimate with partial index

2022-04-20 Thread Tom Lane
Tomas Vondra  writes:
> But dependencies.c might need a fix too, although the issue is somewhat
> inverse to this one, because it looks like this:

> if (IsA(clause, RestrictInfo))
> {
> ... do some checks ...
> }

> so if there's no RestrictInfo on top, we just accept the clause. I guess
> this should do the same thing with checking relids like the fix, but
> I've been unable to construct an example demonstrating the issue (it'd
> have to be either pseudoconstant or reference multiple rels, which seems
> hard to get in btcostestimate).

Hm.  You could get an indexqual referencing other rels when considering
doing a join via a nestloop with parameterized inner indexscan.  However,
that would always be a query WHERE clause, which'd have a RestrictInfo.
At least in this code path, a bare clause would have to be a partial
index's predicate, which could not reference any other rels.  The
pseudoconstant case would require a predicate reducing to WHERE FALSE
or WHERE TRUE, which is at best pointless, though I'm not sure that
we prevent it.

You might have to go looking for other code paths that can pass a
bare clause if you want a test case for this.

regards, tom lane




Re: pg14 psql broke \d datname.nspname.relname

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 3:55 AM Robert Haas  wrote:
> On Tue, Apr 19, 2022 at 10:20 PM Mark Dilger
>  wrote:
> > Looks like most people voted for (B).  In support of that option, here are 
> > patches for master and REL_14_STABLE.  Note that I extended the tests 
> > compared to v9, which found a problem that is fixed for v10:
>
> OK, I committed these. I am not totally sure we've got all the
> problems sorted here, but I don't think that continuing to not commit
> anything is going to be better, so here we go.

Looks like this somehow broke on a Windows box:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19

[14:05:49.729](0.001s) not ok 16 - pg_dumpall: option
--exclude-database rejects multipart pattern ".*": matches
[14:05:49.730](0.000s)
[14:05:49.730](0.000s) #   Failed test 'pg_dumpall: option
--exclude-database rejects multipart pattern ".*": matches'
#   at t/002_pg_dump.pl line 3985.
[14:05:49.730](0.000s) #   'pg_dumpall: error:
improper qualified name (too many dotted names): .gitignore
# '
# doesn't match '(?^:pg_dumpall: error: improper qualified name
\\(too many dotted names\\): \\.\\*)'




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 2:34 PM Tom Lane  wrote:
> The attached draft patch attempts to improve this situation.
> It reserves these OIDs, and creates the associated macros, through
> the normal BKI infrastructure by adding entries in pg_database.dat.
> We have to delete those rows again during initdb, which is slightly
> ugly but surely no more so than initdb's other direct manipulations
> of pg_database.

I'm not sure I really like this approach, but if you're firmly
convinced that it's better than cleaning up the loose ends in some
other way, I'm not going to waste a lot of energy fighting about it.
It doesn't seem horrible.

> There are a few loose ends:
>
> * I'm a bit inclined to simplify IsPinnedObject by just teaching
> it that *no* entries of pg_database are pinned, which would correspond
> to the evident lack of enforcement in dropdb().  Can anyone see a
> reason why we might pin some database in future?

It's kind of curious that we don't pin anything now. There's kind of
nothing to stop you from hosing the system by dropping template0
and/or template1, or mutating them into some crazy form that doesn't
work. But having said that, if as a matter of policy we don't even pin
template0 or template1 or postgres, then it seems unlikely that we
would suddenly decide to pin some new database in the future.

> * I had to set up the additional pg_database entries with nonzero
> datfrozenxid to avoid an assertion failure during initdb's first VACUUM.
> (That VACUUM will overwrite template1's datfrozenxid before computing
> the global minimum frozen XID, but not these others; and it doesn't like
> finding that the minimum is zero.)  This feels klugy.  An alternative is
> to delete the extra pg_database rows before that VACUUM, which would
> mean taking those deletes out of make_template0 and make_postgres and
> putting them somewhere seemingly unrelated, so that's a bit crufty too.
> Anybody have a preference?

Not really. If anything that's an argument against this entire
approach, but I already commented on that point above.

> * The new macro names seem ill-chosen.  Template0ObjectId is spelled
> randomly differently from the longstanding TemplateDbOid, and surely
> PostgresObjectId is about as vague a name as could possibly have
> been thought of (please name an object that it couldn't apply to).
> I'm a little inclined to rename TemplateDbOid to Template1DbOid and
> use Template0DbOid and PostgresDbOid for the others, but I didn't pull
> that trigger here.

Seems totally reasonable. I don't find the current naming horrible or
I'd not have committed it that way, but putting Db in there makes
sense, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg14 psql broke \d datname.nspname.relname

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro  wrote:
> Looks like this somehow broke on a Windows box:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19

So the issue here is that we are running this command:

pg_dumpall --exclude-database .*

And on that Windows machine, .* is being expanded to .gitignore, so
pg_dumpall prints:

pg_dumpall: error: improper qualified name (too many dotted names): .gitignore

Instead of:

pg_dumpall: error: improper qualified name (too many dotted names): .*

I don't know why that glob-expansion only happens on jacana, and I
don't know how to fix it, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Postgres perl module namespace

2022-04-20 Thread Andrew Dunstan


On 2022-04-19 Tu 20:30, Michael Paquier wrote:
> On Tue, Apr 19, 2022 at 07:24:58PM -0400, Andrew Dunstan wrote:
>> On 2022-04-19 Tu 18:39, Michael Paquier wrote:
>>> +*generate_ascii_string = *TestLib::generate_ascii_string;
>>> +*slurp_dir = *TestLib::slurp_dir;
>>> +*slurp_file = *TestLib::slurp_file;
>>>
>>> I am not sure if it is possible and my perl-fu is limited in this
>>> area, but could a failure be enforced when loading this path if a new
>>> routine added in TestLib.pm is forgotten in this list?
>> Not very easily that I'm aware of, but maybe some superior perl wizard
>> will know better.
> Okay.  Please do not consider this as a blocker.  I was just wondering
> about ways to ease more the error reports when it comes to
> back-patching, and this would move the error stack a bit earlier.



There are a few other things that could make backpatching harder, and
while they are not related to the namespace issue they do affect a bit
how that is managed.


The following variables are missing in various versions of TestLib:


in version 13 and earlier: $is_msys2, $timeout_default

in version 12 and earlier: $use_unix_sockets


and the following functions are missing:


in version 14 and earlier: pump_until

in version 13 and earlier: dir_symlink

in version 11 and earlier: run_command

in version 10: check_mode_recursive, chmod_recursive,  check_pg_config


(Also in version 10 command_checks_all exists but isn't exported. I'm
inclined just to remedy that along the way)


Turning to PostgresNode, the class-wide function get_free_port is absent
from version 10, and the following instance methods are absent from some
or all of the back branches:

adjust_conf, clean_node, command_fails_like, config_data, connect_fails,
connect_ok, corrupt_page_checksum, group_access, installed_command,
install_path, interactive_psql, logrotate, set_recovery_mode,
set_standby_mode, wait_for_log

We don't export or provide aliases for any of these instance methods in
these patches, but attempts to use them in backpatched code will fail
where they are absent, so I thought it worth mentioning.


Basically I propose just to remove any mention of the Testlib items and
get_free_port from the export and alias lists for versions where they
are absent. If backpatchers need a function they can backport it if
necessary.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Replace open mode with PG_BINARY_R/W/A macros

2022-04-20 Thread Peter Eisentraut



On 19.04.22 16:21, Tom Lane wrote:

* In the other direction, decide that the PG_BINARY_X macros are
offering no benefit at all and just rip 'em out, writing "rb" and
so on in their place.  POSIX specifies that the character "b" has
no effect on Unix-oid systems, and it has said that for thirty years
now, so we do not really need the platform dependency that presently
exists in the macro definitions.  The presence or absence of "b"
would serve fine as an indicator of intent, and there would be one
less PG-specific coding convention to remember.


I can only imagine that there must have been some Unix systems that did 
not understand the "binary" APIs required for Windows.  (For example, 
neither the Linux nor the macOS open(2) man page mentions O_BINARY.) 
Otherwise, these macros don't make any sense, because then you could 
just write the thing directly on all platforms.





when should we set DB_IN_PRODUCTION?

2022-04-20 Thread Robert Haas
The following code doesn't make a lot of sense to me:

/*
 * All done with end-of-recovery actions.
 *
 * Now allow backends to write WAL and update the control file status in
 * consequence.  SharedRecoveryState, that controls if backends can write
 * WAL, is updated while holding ControlFileLock to prevent other backends
 * to look at an inconsistent state of the control file in shared memory.
 * There is still a small window during which backends can write WAL and
 * the control file is still referring to a system not in DB_IN_PRODUCTION
 * state while looking at the on-disk control file.
 *
 * Also, we use info_lck to update SharedRecoveryState to ensure that
 * there are no race conditions concerning visibility of other recent
 * updates to shared memory.
 */
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->state = DB_IN_PRODUCTION;

SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
SpinLockRelease(&XLogCtl->info_lck);

UpdateControlFile();
LWLockRelease(ControlFileLock);

Before ebdf5bf7d1c97a926e2b0cb6523344c2643623c7 (2016) we changed the
control file state first, then did a bunch of stuff like StartupCLOG()
and TrimCLOG() and RecoverPreparedTransactions(), and then set
RECOVERY_STATE_DONE. Judging by the commit message for the
aforementioned commit, some people didn't like the fact that
DB_IN_PRODUCTION would show up in the control file potentially some
time in advance of when the server was actually able to accept
read-write connections. However, it seems to me that we now have the
opposite problem: as soon as we set RECOVERY_STATE_DONE in shared
memory, some other backend can write WAL, which I think means that we
might start writing WAL before the control file says we're in
production. Which might not be an entirely cosmetic problem, because
there's code that looks at whether the control file state is
DB_SHUTDOWNED to figure out whether we crashed previously -- and it's
hard to imagine that it would be OK to write some WAL, crash before
updating the control file state, and then observe the control file
state on restart to still be indicative of a clean shutdown. I imagine
the race is rather narrow, but I don't see what prevents it in theory.

It seems reasonable to me to want these changes to happen as close
together as possible, and before
ebdf5bf7d1c97a926e2b0cb6523344c2643623c7 that wasn't the case, so I do
acknowledge that there was a legitimate reason to make a change. But I
don't think the change was correct in detail. True simultaneity is
impossible, and one change or the other must happen first, rather than
interleaving them as this code does. And I think the one that should
happen first is the control file update, including the on-disk copy,
because no WAL should be generated until that's fully complete. If
that's not good enough, we could update the control file twice, once
just before allowing connections to say that we're about to allow new
WAL, and once just after to confirm that we did. The first update
would be for the benefit of the server itself, so that it can be
certain whether any WAL might have been generated, and the second one
would just be for the benefit of observers.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Replace open mode with PG_BINARY_R/W/A macros

2022-04-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 19.04.22 16:21, Tom Lane wrote:
>> * In the other direction, decide that the PG_BINARY_X macros are
>> offering no benefit at all and just rip 'em out, writing "rb" and
>> so on in their place.  POSIX specifies that the character "b" has
>> no effect on Unix-oid systems, and it has said that for thirty years
>> now, so we do not really need the platform dependency that presently
>> exists in the macro definitions.  The presence or absence of "b"
>> would serve fine as an indicator of intent, and there would be one
>> less PG-specific coding convention to remember.

> I can only imagine that there must have been some Unix systems that did 
> not understand the "binary" APIs required for Windows.  (For example, 
> neither the Linux nor the macOS open(2) man page mentions O_BINARY.) 
> Otherwise, these macros don't make any sense, because then you could 
> just write the thing directly on all platforms.

PG_BINARY is useful for open().  It's the PG_BINARY_R/W/A macros for
fopen() that are redundant per POSIX.  Possibly someone generalized
inappropriately; or maybe long ago we supported some platform that
rejected the "b" option?

regards, tom lane




Re: pg14 psql broke \d datname.nspname.relname

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 7:35 AM Robert Haas  wrote:
> On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro  wrote:
> > Looks like this somehow broke on a Windows box:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19
>
> So the issue here is that we are running this command:
>
> pg_dumpall --exclude-database .*
>
> And on that Windows machine, .* is being expanded to .gitignore, so
> pg_dumpall prints:
>
> pg_dumpall: error: improper qualified name (too many dotted names): .gitignore
>
> Instead of:
>
> pg_dumpall: error: improper qualified name (too many dotted names): .*
>
> I don't know why that glob-expansion only happens on jacana, and I
> don't know how to fix it, either.

Perhaps bowerbird and jacana have different versions of IPC::Run?  I
see some recent-ish changes to escaping logic in here from Noah:

https://github.com/toddr/IPC-Run/commits/master/lib/IPC/Run/Win32Helper.pm

Looks like the older version looks for meta characters not including
'*', and the later one uses Win32::ShellQuote::quote_native.




renumber_oids.pl needs some updates

2022-04-20 Thread Tom Lane
I did a test run of renumber_oids.pl to see if there would be any
problems when the time comes (pretty soon!) to run it for v15.
Depressingly enough, I found two problems:

1. When commit dfb75e478 invented DECLARE_UNIQUE_INDEX_PKEY,
it neglected to teach renumber_oids.pl about it.  I'm surprised
we did not notice this last year.

2. renumber_oids.pl failed miserably on pg_parameter_acl.h:

@@ -48,11 +48,11 @@ CATALOG(pg_parameter_acl,8924,ParameterAclRelationId) 
BKI_SHARED_RELATION
  */
 typedef FormData_pg_parameter_acl *Form_pg_parameter_acl;
 
-DECLARE_TOAST(pg_parameter_acl, 8925, 8926);
+DECLARE_TOAST(pg_parameter_acl, 6244, 6245);
 #define PgParameterAclToastTable 8925
 #define PgParameterAclToastIndex 8926

because of course it didn't know it should update the
PgParameterAclToastTable and PgParameterAclToastIndex macro definitions.
(We have this same coding pattern elsewhere, but I guess that
renumber_oids.pl has never previously been asked to renumber a shared
catalog.)

I think the right way to fix #2 is to put the responsibility for
generating the #define's into genbki.pl, instead of this mistake-prone
approach of duplicating the OID constants in the source code.

The attached proposed patch invents a variant macro
DECLARE_TOAST_WITH_MACRO for the relatively small number of cases
where we need such OID macros.  A different idea could be to require
all the catalog headers to define C macros for their toast tables
and change DECLARE_TOAST to a five-argument macro across the board.
However, that would require touching a bunch more places and inventing
a bunch more macro names, and it didn't really seem useful.

Thoughts?

regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index dbc3f87e28..0275795dea 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -94,6 +94,17 @@ sub ParseHeader
 			push @{ $catalog{toasting} },
 			  { parent_table => $1, toast_oid => $2, toast_index_oid => $3 };
 		}
+		elsif (/^DECLARE_TOAST_WITH_MACRO\(\s*(\w+),\s*(\d+),\s*(\d+),\s*(\w+),\s*(\w+)\)/)
+		{
+			push @{ $catalog{toasting} },
+			  {
+parent_table  => $1,
+toast_oid => $2,
+toast_index_oid   => $3,
+toast_oid_macro   => $4,
+toast_index_oid_macro => $5
+			  };
+		}
 		elsif (
 			/^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(.+)\)/)
 		{
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 2dfcdc5dad..2d02b02267 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -472,6 +472,16 @@ EOM
 	  $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid}
 	  if $catalog->{rowtype_oid_macro};
 
+	# Likewise for macros for toast and index OIDs
+	foreach my $toast (@{ $catalog->{toasting} })
+	{
+		printf $def "#define %s %s\n",
+		  $toast->{toast_oid_macro}, $toast->{toast_oid}
+		  if $toast->{toast_oid_macro};
+		printf $def "#define %s %s\n",
+		  $toast->{toast_index_oid_macro}, $toast->{toast_index_oid}
+		  if $toast->{toast_index_oid_macro};
+	}
 	foreach my $index (@{ $catalog->{indexing} })
 	{
 		printf $def "#define %s %s\n",
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 9fb0d2c83d..4ecd76f4be 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -55,9 +55,13 @@
  * need stable OIDs for shared relations, and that includes toast tables
  * of shared relations.
  *
- * The macro definition is just to keep the C compiler from spitting up.
+ * The DECLARE_TOAST_WITH_MACRO variant is used when C macros are needed
+ * for the toast table/index OIDs (usually only for shared catalogs).
+ *
+ * The macro definitions are just to keep the C compiler from spitting up.
  */
 #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable
+#define DECLARE_TOAST_WITH_MACRO(name,toastoid,indexoid,toastoidmacro,indexoidmacro) extern int no_such_variable
 
 /*
  * These lines are processed by genbki.pl to create the statements
diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h
index 4b65e39a1f..3512601c80 100644
--- a/src/include/catalog/pg_authid.h
+++ b/src/include/catalog/pg_authid.h
@@ -55,9 +55,7 @@ CATALOG(pg_authid,1260,AuthIdRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(284
  */
 typedef FormData_pg_authid *Form_pg_authid;
 
-DECLARE_TOAST(pg_authid, 4175, 4176);
-#define PgAuthidToastTable 4175
-#define PgAuthidToastIndex 4176
+DECLARE_TOAST_WITH_MACRO(pg_authid, 4175, 4176, PgAuthidToastTable, PgAuthidToastIndex);
 
 DECLARE_UNIQUE_INDEX(pg_authid_rolname_index, 2676, AuthIdRolnameIndexId, on pg_authid using btree(rolname name_ops));
 DECLARE_UNIQUE_INDEX_PKEY(pg_authid_oid_index, 2677, AuthIdOidIndexId, on pg_authid using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index a9f4a8071f..e10e91c0af 100644
--- a/src/include/catalog/pg_database.h

Re: Dump/Restore of non-default PKs

2022-04-20 Thread Peter Eisentraut

On 18.04.22 22:48, Tom Lane wrote:

Why not just get rid of the limitation that constraint definitions don't
support non-default methods?

That approach would be doubling down on the assumption that we can always
shoehorn more custom options into SQL-standard constraint clauses, and
we'll never fall foul of shift/reduce problems or future spec additions.


When we do get the ability to create a table with a primary key with an 
underlying hash index, how would that be done?  Would the only way be


1. create the table without primary key
2. create the index
3. attach the index as primary key constraint

That doesn't sound attractive.




Re: pg14 psql broke \d datname.nspname.relname

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 8:38 AM Thomas Munro  wrote:
> On Thu, Apr 21, 2022 at 7:35 AM Robert Haas  wrote:
> > On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro  wrote:
> > > Looks like this somehow broke on a Windows box:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19
> >
> > So the issue here is that we are running this command:
> >
> > pg_dumpall --exclude-database .*
> >
> > And on that Windows machine, .* is being expanded to .gitignore, so
> > pg_dumpall prints:
> >
> > pg_dumpall: error: improper qualified name (too many dotted names): 
> > .gitignore
> >
> > Instead of:
> >
> > pg_dumpall: error: improper qualified name (too many dotted names): .*
> >
> > I don't know why that glob-expansion only happens on jacana, and I
> > don't know how to fix it, either.
>
> Perhaps bowerbird and jacana have different versions of IPC::Run?  I
> see some recent-ish changes to escaping logic in here from Noah:
>
> https://github.com/toddr/IPC-Run/commits/master/lib/IPC/Run/Win32Helper.pm
>
> Looks like the older version looks for meta characters not including
> '*', and the later one uses Win32::ShellQuote::quote_native.

This time with Andrew in CC.




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 20, 2022 at 2:34 PM Tom Lane  wrote:
>> The attached draft patch attempts to improve this situation.
>> It reserves these OIDs, and creates the associated macros, through
>> the normal BKI infrastructure by adding entries in pg_database.dat.
>> We have to delete those rows again during initdb, which is slightly
>> ugly but surely no more so than initdb's other direct manipulations
>> of pg_database.

> I'm not sure I really like this approach, but if you're firmly
> convinced that it's better than cleaning up the loose ends in some
> other way, I'm not going to waste a lot of energy fighting about it.

Having just had to bury my nose in renumber_oids.pl, I thought of a
different approach we could take to expose these OIDs to Catalog.pm.
That's to invent a new macro that Catalog.pm recognizes, and write
something about like this in pg_database.h:

DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);

That would result in (a) the OIDs becoming known to Catalog.pm
as reserved, though it wouldn't have any great clue about their
semantics; and (b) this getting emitted into pg_database_d.h:

#define Template0ObjectId 4
#define PostgresObjectId 5

Then we'd not need the dummy entries in pg_database.dat, which
does seem cleaner now that I think about it.  A downside is that
with no context, Catalog.pm could not provide name translation
services during postgres.bki generation for such OIDs --- but
at least for these entries, we don't need that.

If that seems more plausible to you I'll set about preparing a patch.

regards, tom lane




Re: renumber_oids.pl needs some updates

2022-04-20 Thread Peter Eisentraut

On 20.04.22 22:45, Tom Lane wrote:

I think the right way to fix #2 is to put the responsibility for
generating the #define's into genbki.pl, instead of this mistake-prone
approach of duplicating the OID constants in the source code.

The attached proposed patch invents a variant macro
DECLARE_TOAST_WITH_MACRO for the relatively small number of cases
where we need such OID macros.


This makes sense.

A more elaborate (future) project would be to have genbki.pl generate 
all of IsSharedRelation(), which is the only place these toast-table-OID 
macros are used, AFAICT.






Re: [RFC] building postgres with meson -v8

2022-04-20 Thread Andres Freund
Hi,

On 2022-04-13 12:26:05 +0200, Peter Eisentraut wrote:
> Some feedback and patches for your branch at
> 3274198960c139328fef3c725cee1468bbfff469:

Thanks! I just rebased the branch, will merge your changes once the fallout
from that is fixed...


> 0001-Install-a-few-more-files.patch
> 
> These are just some files that were apparently forgotten to be installed so
> far.

> 0002-Adjust-some-header-file-installation-paths.patch
> 
> The installation of server headers is apparently still in progress. This
> just adjusts the installation directory of those that are already being
> dealt with, so they match the existing installation layout.


Yea. I've not at all paid attention to that so far, besides getting tests to
pass.


> 0003-Fix-warnings-about-deprecated-features.patch
> 
> This fixes some deprecation warnings and raises the requirement to 0.56.

I don't see any deprecation warnings - I see some notices about *future*
deprecated features being used:

NOTICE: Future-deprecated features used:
 * 0.55.0: {'ExternalProgram.path'}
 * 0.56.0: {'meson.source_root', 'meson.build_root'}

(i.e. once the minimum version is increased to > 0.54, those will trigger
deprecation warnings)

What are you seeing with what version?


> I'm not sure why the current cutoff at 0.54 was chosen.  Perhaps that could
> be documented.

Not quite sure why I ended up with 0.54. We definitely should require at most
0.56, as that's the last version supporting python 3.5.


> 0004-Install-postmaster-symlink.patch
> 
> This needs 0.61, so maybe it's a bit too new.

Yea, that's too new. I think we can just create the symlink using ln or such
if we need it.


> Or we could get rid of the postmaster symlink altogether?

But that seems like a better approach.


> 0005-Workaround-for-Perl-detection.patch
> 
> This is needed on my system to get the Perl detection to pass.  If I look at
> the equivalent configure code, some more refinement appears to be needed in
> this area.

> From 1f80e1ebb8efeb0eba7d57032282520fd6455b0d Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Wed, 13 Apr 2022 11:50:52 +0200
> Subject: [PATCH 5/5] Workaround for Perl detection
> 
> ---
>  meson.build | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/meson.build b/meson.build
> index 1bf53ea24d..e33ed11b08 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -545,9 +545,9 @@ else
># file existence.
>if perl_may_work
>  perl_ccflags += ['-I@0@'.format(perl_inc_dir)]
> -if host_machine.system() == 'darwin'
> -  perl_ccflags += ['-iwithsysroot', perl_inc_dir]
> -endif
> +#if host_machine.system() == 'darwin'
> +#  perl_ccflags += ['-iwithsysroot', perl_inc_dir]
> +#endif
>endif

What problem do you see without this? It did build on CI and on my m1 mini box
as is...

Greetings,

Andres Freund




Re: Add version and data directory to initdb output

2022-04-20 Thread Peter Eisentraut

On 19.04.22 15:55, David G. Johnston wrote:
The motivating situation had me placing it as close to the last line as 
possible so my 8 line or so tmux panel would show it to me without 
scrolling.  The version is all I cared about, but when writing the patch 
the path seemed to be at least worth considering.


As for "Success", I'm confused about the --no-instructions choice to 
change it the way it did, but given that precedent I only felt it 
important to leave the word Success as the leading word on a line. 
Scripts should be triggering on the exit code anyway and presently 
--no-instructions removes the Success acknowledgement completely anyway.


The order of outputs of initdb seems to be approximately

1. These are the settings I will use based on what you told me.
2. This is what I'm doing right now.
3. Here's what you can do next.

Your additions would appear to fall into bucket #1.  So I think adding 
them near the start of the output makes more sense.  Otherwise, one 
could also argue that all the locale information etc. should also be 
repeated at the end, in case one forgot them or whatever.





Re: renumber_oids.pl needs some updates

2022-04-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 20.04.22 22:45, Tom Lane wrote:
>> The attached proposed patch invents a variant macro
>> DECLARE_TOAST_WITH_MACRO for the relatively small number of cases
>> where we need such OID macros.

> This makes sense.

> A more elaborate (future) project would be to have genbki.pl generate 
> all of IsSharedRelation(), which is the only place these toast-table-OID 
> macros are used, AFAICT.

Perhaps.  We invent shared catalogs at a slow enough rate that I'm
not sure the effort would ever pay for itself in person-hours,
but maybe making such invention a trifle less error-prone is
worth something.

I'd still want to keep this form of DECLARE_TOAST, in case someone
comes up with a different reason to want macro names for toast OIDs.

regards, tom lane




Re: Add version and data directory to initdb output

2022-04-20 Thread David G. Johnston
On Wed, Apr 20, 2022 at 2:04 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 19.04.22 15:55, David G. Johnston wrote:
> > The motivating situation had me placing it as close to the last line as
> > possible so my 8 line or so tmux panel would show it to me without
> > scrolling.  The version is all I cared about, but when writing the patch
> > the path seemed to be at least worth considering.
> >
> > As for "Success", I'm confused about the --no-instructions choice to
> > change it the way it did, but given that precedent I only felt it
> > important to leave the word Success as the leading word on a line.
> > Scripts should be triggering on the exit code anyway and presently
> > --no-instructions removes the Success acknowledgement completely anyway.
>
> The order of outputs of initdb seems to be approximately
>
> 1. These are the settings I will use based on what you told me.
> 2. This is what I'm doing right now.
> 3. Here's what you can do next.
>
> Your additions would appear to fall into bucket #1.  So I think adding
> them near the start of the output makes more sense.  Otherwise, one
> could also argue that all the locale information etc. should also be
> repeated at the end, in case one forgot them or whatever.
>

I agree with the observation but it initdb is fast enough and
non-interactive and so that order isn't particularly appealing.

Thus either:

1. Initialization is running ... Here's what we are doing.
2. All done!  Here's what we did.
3. Here's what you can do next.

or

1. These are the settings I will use based on what you told me.
2. This is what I'm doing right now.
3. All done! Here's what you ended up with (can repeat items from 1 if
desired...)
4. Here's what you can do next.

I'd rather do the first proposal given buy-in.  Though I would have
concerns about what the output looks like upon failure.

I'm basically proposing the second option, add a formal "All done!" section
and recap what the final result is.  I'd be content with having the version
appear in both 1 and 3 in that scenario.  It isn't a frequently executed
command, already is verbose, and when done interactively in development I
don't want to have to dedicate a 20 line panel so I can see "All Done!" and
some (one) key attribute(s) (locale and path seems useful though) without
scrolling.

If the consensus is to place it before, and only before, the "this is what
I'm doing right now" stuff, that is better than nothing, but the choice of
not doing so was intentional.

David J.


Re: typos

2022-04-20 Thread Alvaro Herrera
On 2022-Apr-19, Alvaro Herrera wrote:

> I propose we standardize on Zstd everywhere.
> Users can look it up if they're really interested.

So the attached.

There are other uses of zstd, but those are referring to the
executable program.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)
>From 256904cc5a8718ac081dbf51f6263ab022e503f6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 20 Apr 2022 23:26:28 +0200
Subject: [PATCH] Zstd

---
 doc/src/sgml/install-windows.sgml | 6 +++---
 doc/src/sgml/installation.sgml| 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index 43cc5f6f5b..104670d295 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -307,9 +307,9 @@ $ENV{MSBFLAGS}="/m";
 
 
 
- ZSTD
+ Zstd
  
-  Required for supporting ZSTD compression
+  Required for supporting Zstd compression
   method. Binaries and source can be downloaded from
   https://github.com/facebook/zstd/releases";>.
  
@@ -560,7 +560,7 @@ $ENV{PROVE_TESTS}='t/020*.pl t/010*.pl'
 
  ZSTD
  
-  Path to a zstd command. The default is
+  Path to a Zstd command. The default is
   zstd, which will search for a command by that
   name in the configured PATH.
  
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index df32025a86..c60484e221 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -273,7 +273,7 @@ su - postgres
 
 
  
-  You need zstd, if you want to support
+  You need Zstd, if you want to support
   compression of data with that method; see
   .
   The minimum required version is 1.4.0.
@@ -996,7 +996,7 @@ build-postgresql:
--with-zstd

 
- Build with ZSTD compression support.
+ Build with Zstd compression support.
 

   
-- 
2.30.2



Assert failure in CTE inlining with view and correlated subquery

2022-04-20 Thread Tomas Vondra
Hi,

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE. Consider a simple
example like this:

create table results (
  id  serial primary key,
  run text,
  tps float4
);

create view results_agg as
with base_tps as (
  select run, tps from results
)
select
run,
count(*) as runs,

(select tps from base_tps b where b.run = r.run) AS base_tps

from results r
group by
run
order by
run;

explain SELECT run FROM results_agg ORDER BY 1;


This crashes on this assert in inline_cte():

Assert(context.refcount == 0);

because the refcount value remains 1. There's a backtrace attached.

I don't know why exactly this happens, my knowledge of CTE inlining is
somewhat limited. The counter is clearly out of sync


but a couple more observations:

1) it fails all the way back to PG12, where CTE inlining was added

2) it does not happen if the CTE is defined as MATERIALIZED

   QUERY PLAN
-
 Subquery Scan on results_agg
   ->  Sort
 Sort Key: r.run
 CTE base_tps
   ->  Seq Scan on results
 ->  HashAggregate
   Group Key: r.run
   ->  Seq Scan on results r
(8 rows)

3) without asserts, it seems to work and the query generates this plan

   QUERY PLAN
-
 Subquery Scan on results_agg
   ->  Sort
 Sort Key: r.run
 ->  HashAggregate
   Group Key: r.run
   ->  Seq Scan on results r
(6 rows)

4) it does not seem to happen without the view, i.e. this works

explain
with base_tps as (
  select run, tps from results
)
select run from (
  select
run,
count(*) as runs,

(select tps from base_tps b where b.run = r.run) AS base_tps

from results r
group by
run
order by
run
) results_agg order by 1;

The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

test.sql
Description: application/sql
Missing separate debuginfos, use: dnf debuginfo-install 
libgcc-11.2.1-9.fc34.x86_64
(gdb) bt
#0  0x7159d9be82a2 in raise () from /lib64/libc.so.6
#1  0x7159d9bd18a4 in abort () from /lib64/libc.so.6
#2  0x0096d3fa in ExceptionalCondition 
(conditionName=conditionName@entry=0xae4ed3 "context.refcount == 0", 
errorType=errorType@entry=0x9c3017 "FailedAssertion", 
fileName=fileName@entry=0xae4da6 "subselect.c", 
lineNumber=lineNumber@entry=1166) at assert.c:69
#3  0x0078acda in inline_cte (cte=0x134e240, root=0x1242298) at 
subselect.c:1166
#4  SS_process_ctes (root=root@entry=0x1242298) at subselect.c:963
#5  0x00782fb1 in subquery_planner (glob=0x134dbc0, 
parse=parse@entry=0x134daa8, parent_root=parent_root@entry=0x13491e8, 
hasRecursion=hasRecursion@entry=false, tuple_fraction=0) at planner.c:650
#6  0x00752afe in set_subquery_pathlist (rte=, 
rti=, rel=, root=) at 
allpaths.c:2572
#7  set_rel_size (root=, rel=, rti=, rte=) at allpaths.c:424
#8  0x00755080 in set_base_rel_sizes (root=) at 
allpaths.c:325
#9  make_one_rel (root=root@entry=0x13491e8, joinlist=joinlist@entry=0x134e078) 
at allpaths.c:187
#10 0x0077b4c2 in query_planner (root=root@entry=0x13491e8, 
qp_callback=qp_callback@entry=0x77c450 , 
qp_extra=qp_extra@entry=0x7fff5a500ba0) at planmain.c:276
#11 0x00781122 in grouping_planner (root=, 
tuple_fraction=) at planner.c:1467
#12 0x007838a1 in subquery_planner (glob=glob@entry=0x134dbc0, 
parse=parse@entry=0x1241818, parent_root=parent_root@entry=0x0, 
hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0) 
at planner.c:1044
#13 0x00783ee3 in standard_planner (parse=0x1241818, 
query_string=, cursorOptions=2048, boundParams=) 
at planner.c:406
#14 0x0084e7a8 in pg_plan_query (querytree=0x1241818, 
query_string=query_string@entry=0x12407d0 "SELECT run FROM results_agg ORDER BY 
1;", cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) 
at postgres.c:883
#15 0x0084e8a1 in pg_plan_queries (querytrees=0x134bf00, 
query_string=query_string@entry=0x12407d0 "SELECT run FROM results_agg ORDER BY 
1;", cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) 
at postgres.c:975
#16 0x0084ebab in exec_simple_query (query_string=0x12407d0 "SELECT run 
FROM results_agg ORDER BY 1;") at postgres.c:1169
#17 0x008506df in PostgresMain (dbname=, 
username=) at postgres.c:4542
#18 0x007c1d12 in BackendRun (port=0x1265e90, port=0x1265e90) at 

Re: typos

2022-04-20 Thread Justin Pryzby
On Wed, Apr 20, 2022 at 11:32:08PM +0200, Alvaro Herrera wrote:
> On 2022-Apr-19, Alvaro Herrera wrote:
> 
> > I propose we standardize on Zstd everywhere.
> > Users can look it up if they're really interested.
> 
> So the attached.
> 
> There are other uses of zstd, but those are referring to 
> the
> executable program.

This one shouldn't be changed, or not like this?

> @@ -560,7 +560,7 @@ $ENV{PROVE_TESTS}='t/020*.pl t/010*.pl'
>  
>   ZSTD
>   
> -  Path to a zstd command. The default is
> +  Path to a Zstd command. The default is
>zstd, which will search for a command by that
>name in the configured PATH.
>   

Maybe it should say s/a/the/, like:

-  Path to a zstd command. The default is
+  Path to the zstd command. The default is




More problems with VacuumPageHit style global variables

2022-04-20 Thread Peter Geoghegan
My recent bugfix commit d3609dd2 addressed an issue with VACUUM
VERBOSE. It would aggregate buffers hit/missed/dirtied counts
incorrectly (by double counting), though only when there are multiple
heap rels processed by the same VACUUM command. It failed to account
for the fact that the VacuumPageHit, VacuumPageMiss, and
VacuumPageDirty global variables were really only designed to work in
autovacuum (see 2011 commit 9d3b5024).

I just realized that there is one remaining problem: parallel VACUUM
doesn't care about these global variables, so there will still be
discrepancies there. I can't really blame that on parallel VACUUM,
though, because vacuumparallel.c at least copies buffer usage counters
from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE
space). I wonder why we still have these seemingly redundant global
variables, which are maintained by bufmgr.c (alongside the
pgBufferUsage stuff). It looks like recent commit 5dc0418fab
("Prefetch data referenced by the WAL, take II") taught bufmgr.c to
instrument all buffer accesses. So it looks like we just don't need
VacuumPageHit and friends anymore.

Wouldn't it be better if every VACUUM used the same generic approach,
using pgBufferUsage? As a general rule code that only runs in
autovacuum is a recipe for bugs. It looks like VacuumPageHit is
maintained based on different rules to pgBufferUsage.shared_blks_hit
in bufmgr.c (just as an example), which seems like a bad sign.
Besides, the pgBufferUsage counters have more information, which seems
like it might be useful to the lazyvacuum.c instrumentation.

One question for the author of the WAL prefetch patch, Thomas (CC'd):
It's not 100% clear what the expectation is with pgBufferUsage when
track_io_timing is off, so are fields like
pgBufferUsage.shared_blks_hit (i.e. those that don't have a
time/duration component) officially okay to rely on across the board?
It looks like they are okay to rely on (even when track_io_timing is
off), but it would be nice to put that on a formal footing, if it
isn't already.

-- 
Peter Geoghegan




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 4:56 PM Tom Lane  wrote:
> Having just had to bury my nose in renumber_oids.pl, I thought of a
> different approach we could take to expose these OIDs to Catalog.pm.
> That's to invent a new macro that Catalog.pm recognizes, and write
> something about like this in pg_database.h:
>
> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);
>
> If that seems more plausible to you I'll set about preparing a patch.

I like it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread Tyler Brock
I think this makes sense but I wanted to get confirmation:

I created a table with a column having the type int4 (integer). When I
insert a row with a number into that column and get it back out I've
observed a discrepancy:

The DataRow message has the field encoded as an ASCII ‘7’ with a column
length of 1 despite the RowDescription having a column length 4. I assume
that this is because it’s a simple query (Q) and therefore the format code
for all columns is 0 (for text format).

It makes sense that at the time the RowDescription is written out that it
can’t possibly know how many bytes the textual representation of each int
will take so it just uses the length of the underlying type.

Is this accurate?

-Tyler


doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml

2022-04-20 Thread David G. Johnston
Hackers,

The new cumulative stats subsystem no longer has a "lost under heavy load"
problem so that parenthetical should go (or at least be modified).

These stats can be reset so some discussion about how the system uses them
given that possibility seems like it would be good to add here.  I'm not
sure what that should look like though.

David J.


diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 04a04e0e5f..360807c8f9 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert threshold
+ vacuum insert scale fac
 tuples to be frozen by earlier vacuums.  The number of obsolete tuples
and
 the number of inserted tuples are obtained from the cumulative
statistics system;
 it is a semi-accurate count updated by each UPDATE,
-DELETE and INSERT operation.
 (It is
-only semi-accurate because some information might be lost under heavy
-load.)  If the relfrozenxid value of the
table
+DELETE and INSERT operation.
+If the relfrozenxid value of the table
 is more than vacuum_freeze_table_age transactions
old,
 an aggressive vacuum is performed to freeze old tuples and advance
 relfrozenxid; otherwise, only pages that
have been modified


Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread David G. Johnston
On Wed, Apr 20, 2022 at 4:39 PM Tyler Brock  wrote:

> I think this makes sense but I wanted to get confirmation:
>
> I created a table with a column having the type int4 (integer). When I
> insert a row with a number into that column and get it back out I've
> observed a discrepancy:
>
> The DataRow message has the field encoded as an ASCII ‘7’ with a column
> length of 1 despite the RowDescription having a column length 4. I assume
> that this is because it’s a simple query (Q) and therefore the format code
> for all columns is 0 (for text format).
>
> It makes sense that at the time the RowDescription is written out that it
> can’t possibly know how many bytes the textual representation of each int
> will take so it just uses the length of the underlying type.
>
> Is this accurate?
>
>
You probably shouldn't think of DataRow as giving you a "column length" -
it is simply giving you the number of bytes you need to read to retrieve
all of the bytes for the column and thus position your read pointer at the
data length Int32 for the subsequent column (which you do iteratively Int16
column count times).

You now have bytes for columnN - which you need to interpret via
RowDescription to transform the raw protocol bytes into a meaningful datum.

You don't care whether the source API was simple or not - RowDescription
will tell you what you need to know to interpret the value - it is all
self-contained.  But yes, because it is a simple query the RowDescription
meta-data will inform you that all of the bytes represent (in aggregate ?)
the textual representation of the data.

David J.


Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread Tyler Brock
 For sure, I’m thinking of it that way. Thanks for confirming.

What I don’t understand is that if I respond to psql with the
RowDescription indicating the format code is 1 for binary (and encode it
that way, with 4 bytes, in the DataRow) it doesn’t render the number in the
results.

-Tyler


On Apr 20, 2022 at 8:00:18 PM, David G. Johnston 
wrote:

> On Wed, Apr 20, 2022 at 4:39 PM Tyler Brock  wrote:
>
>> I think this makes sense but I wanted to get confirmation:
>>
>> I created a table with a column having the type int4 (integer). When I
>> insert a row with a number into that column and get it back out I've
>> observed a discrepancy:
>>
>> The DataRow message has the field encoded as an ASCII ‘7’ with a column
>> length of 1 despite the RowDescription having a column length 4. I assume
>> that this is because it’s a simple query (Q) and therefore the format code
>> for all columns is 0 (for text format).
>>
>> It makes sense that at the time the RowDescription is written out that it
>> can’t possibly know how many bytes the textual representation of each int
>> will take so it just uses the length of the underlying type.
>>
>> Is this accurate?
>>
>>
> You probably shouldn't think of DataRow as giving you a "column length" -
> it is simply giving you the number of bytes you need to read to retrieve
> all of the bytes for the column and thus position your read pointer at the
> data length Int32 for the subsequent column (which you do iteratively Int16
> column count times).
>
> You now have bytes for columnN - which you need to interpret via
> RowDescription to transform the raw protocol bytes into a meaningful datum.
>
> You don't care whether the source API was simple or not - RowDescription
> will tell you what you need to know to interpret the value - it is all
> self-contained.  But yes, because it is a simple query the RowDescription
> meta-data will inform you that all of the bytes represent (in aggregate ?)
> the textual representation of the data.
>
> David J.
>
>


Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread David G. Johnston
On Wed, Apr 20, 2022 at 5:11 PM Tyler Brock  wrote:

> For sure, I’m thinking of it that way. Thanks for confirming.
>
> What I don’t understand is that if I respond to psql with the
> RowDescription indicating the format code is 1 for binary (and encode it
> that way, with 4 bytes, in the DataRow) it doesn’t render the number in the
> results.
>
>>
>>
Please don't top-post.

psql is a command line program, the server is PostgreSQL or postgres.

I'm not familiar with interacting with the server in C or at the protocol
level; I have no idea what that sentence is supposed to mean.  But
RowDescription seems to be strictly informative so how would you "respond
to psql with [it]"?

David J.


Re: CLUSTER sort on abbreviated expressions is broken

2022-04-20 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 11:01 AM Peter Geoghegan  wrote:
> Attached patch fixes the issue, and includes the test case that you posted.

Pushed a similar patch just now. Backpatched to all supported branches.

-- 
Peter Geoghegan




Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread Tyler Brock
 I’m not sure what top-posting is?

I’m talking about responding to psql the command line program.

-Tyler


On Apr 20, 2022 at 8:16:28 PM, David G. Johnston 
wrote:

> On Wed, Apr 20, 2022 at 5:11 PM Tyler Brock  wrote:
>
>> For sure, I’m thinking of it that way. Thanks for confirming.
>>
>> What I don’t understand is that if I respond to psql with the
>> RowDescription indicating the format code is 1 for binary (and encode it
>> that way, with 4 bytes, in the DataRow) it doesn’t render the number in the
>> results.
>>
>>>
>>>
> Please don't top-post.
>
> psql is a command line program, the server is PostgreSQL or postgres.
>
> I'm not familiar with interacting with the server in C or at the protocol
> level; I have no idea what that sentence is supposed to mean.  But
> RowDescription seems to be strictly informative so how would you "respond
> to psql with [it]"?
>
> David J.
>
>
>


Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread David G. Johnston
On Wed, Apr 20, 2022 at 5:21 PM Tyler Brock  wrote:

> I’m not sure what top-posting is?
>

It's when you place your replies before what you are replying to.
https://en.wikipedia.org/wiki/Posting_style

Unlike mine, which is inline-posting, where the reply is after the thing
being replied to, trimming unneeded context as appropriate.

>
> I’m talking about responding to psql the command line program.
>
>
Ok.  I'm outside my league then.

David J.


Assorted small doc patches

2022-04-20 Thread David G. Johnston
Hackers,

I posted all of these elsewhere (docs, bugs) but am consolidating them here
going forward.


v0001-database-default-name (-bugs, with a related cleanup suggestion as
well)
https://www.postgresql.org/message-id/flat/CAKFQuwZvHH1HVSOu7EYjvshynk4pnDwC5RwkF%3DVfZJvmUskwrQ%40mail.gmail.com#0e6d799478d88aee93402bec35fa64a2


v0002-doc-extension-dependent-routine-behavior (-general, reply to user
confusion)
https://www.postgresql.org/message-id/CAKFQuwb_QtY25feLeh%3D8uNdnyo1H%3DcN4R3vENsUwQzJP4-0xZg%40mail.gmail.com


v0001-doc-savepoint-name-reuse (-docs, reply to user request for
improvement)
https://www.postgresql.org/message-id/CAKFQuwYzSb9OW5qTFgc0v9RWMN8bX83wpe8okQ7x6vtcmfA2KQ%40mail.gmail.com


v0001-on-conflict-excluded-is-name-not-table (-docs, figured out while
trying to improve the docs to reduce user confusion in this area)
https://www.postgresql.org/message-id/flat/CAKFQuwYN20c0%2B7kKvm3PBgibu77BzxDvk9RvoXBb1%3Dj1mDODPw%40mail.gmail.com#ea79c88b55fdccecbd2c4fe549f321c9


v0001-doc-make-row-estimation-example-match-prose (-docs, reply to user
pointing of an inconsistency)
https://www.postgresql.org/message-id/CAKFQuwax7V5R_rw%3DEOWmy%3DTBON6v3sveBx_WvwsENskCL5CLQQ%40mail.gmail.com

Thanks!

David J.


v0001-doc-make-row-estimation-example-match-prose.patch
Description: Binary data


v0001-database-default-name.patch
Description: Binary data


v0001-doc-savepoint-name-reuse.patch
Description: Binary data


v0001-on-conflict-excluded-is-name-not-table.patch
Description: Binary data


v0002-doc-extension-dependent-routine-behavior.patch
Description: Binary data


Re: CLUSTER sort on abbreviated expressions is broken

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 12:18 PM Peter Geoghegan  wrote:
> On Tue, Apr 12, 2022 at 11:01 AM Peter Geoghegan  wrote:
> > Attached patch fixes the issue, and includes the test case that you posted.
>
> Pushed a similar patch just now. Backpatched to all supported branches.

Thanks.




RE: Logical replication timeout problem

2022-04-20 Thread wangw.f...@fujitsu.com
On Wed, Apr 20, 2022 at 6:13 PM Amit Kapila  wrote:
> On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada
>  wrote:
> > >
> > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > > ```
> > >
> > > I'm concerned that this 4-byte padding at the end of the struct could
> > > depend on platforms (there might be no padding in 32-bit platforms?).
> > >
> >
> > Good point, but ...
> >
> > > It seems to me that it's better to put it after fast_forward where the
> > > new field should fall within the padding space.
> > >
> >
> > Can we add the variable in between the existing variables in the
> > structure in the back branches?
> >
> 
> I think it should be fine if it falls in the padding space. We have
> done similar changes recently in back-branches [1]. I think it would
> be then better to have it in the same place in HEAD as well?
> 
> [1] -
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346
> 876aad4941797c2255a21bdac74739
Thanks for your comments.

The comments by Sawada-San sound reasonable to me.
After doing check, I found that padding in HEAD is the same as in REL14.
So I change the approach of patch for HEAD just like the patch for REL14.

On Wed, Apr 20, 2022 at 3:21 PM Masahiko Sawada  wrote:
> I'm concerned that this 4-byte padding at the end of the struct could
> depend on platforms (there might be no padding in 32-bit platforms?).
> It seems to me that it's better to put it after fast_forward where the
> new field should fall within the padding space.
Fixed. Add new variable after fast_forward.

> BTW the changes in
> REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
> adding end_xact to LogicalDecodingContext, seems good to me and it
> might be better than the approach of v17 patch from plugin developers’
> perspective? This is because they won’t need to pass true/false to
> end_xact of  OutputPluginUpdateProgress(). Furthermore, if we do what
> we do in update_replication_progress() in
> OutputPluginUpdateProgress(), what plugins need to do will be just to
> call OutputPluginUpdate() in every callback and they don't need to
> have the CHANGES_THRESHOLD logic. What do you think?
Change the approach of patch for HEAD. (The size of structure does not change.)
Also move the logical of function update_replication_progress to function
OutputPluginUpdateProgress.

Attach the patches. [suggestion by Sawada-San]
1. Change the position of the new variable in structure.
2. Change the approach of the patch for HEAD.
3. Delete the new function update_replication_progress.

Regards,
Wang wei


HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch


Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila
On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
> >
>
> > I think it would
> > be then better to have it in the same place in HEAD as well?
>
> As far as I can see in the v17 patch, which is for HEAD, we don't add
> a variable to LogicalDecodingContext, but did you refer to another
> patch?
>

No, I thought it is better to follow the same approach in HEAD as
well. Do you see any problem with it?

-- 
With Regards,
Amit Kapila.




Re: typos

2022-04-20 Thread Amit Kapila
On Wed, Apr 20, 2022 at 5:31 PM Alvaro Herrera  wrote:
>
> On 2022-Apr-20, Amit Kapila wrote:
>
> > Your proposed changes look good to me but I think all these places
> > need to mention 'column list' as well because the behavior is the same
> > for it.
>
> Hmm, you're right.  Added that, and changed the wording somewhat because
> some things read awkwardly.  Here's the result in patch form.
>

LGTM.

-- 
With Regards,
Amit Kapila.




Re: More problems with VacuumPageHit style global variables

2022-04-20 Thread Thomas Munro
On Thu, Apr 21, 2022 at 10:03 AM Peter Geoghegan  wrote:
> I just realized that there is one remaining problem: parallel VACUUM
> doesn't care about these global variables, so there will still be
> discrepancies there. I can't really blame that on parallel VACUUM,
> though, because vacuumparallel.c at least copies buffer usage counters
> from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE
> space). I wonder why we still have these seemingly redundant global
> variables, which are maintained by bufmgr.c (alongside the
> pgBufferUsage stuff). It looks like recent commit 5dc0418fab
> ("Prefetch data referenced by the WAL, take II") taught bufmgr.c to
> instrument all buffer accesses. So it looks like we just don't need
> VacuumPageHit and friends anymore.

Yeah, that sounds right.

> Wouldn't it be better if every VACUUM used the same generic approach,
> using pgBufferUsage? As a general rule code that only runs in
> autovacuum is a recipe for bugs. It looks like VacuumPageHit is
> maintained based on different rules to pgBufferUsage.shared_blks_hit
> in bufmgr.c (just as an example), which seems like a bad sign.
> Besides, the pgBufferUsage counters have more information, which seems
> like it might be useful to the lazyvacuum.c instrumentation.

+1

> One question for the author of the WAL prefetch patch, Thomas (CC'd):
> It's not 100% clear what the expectation is with pgBufferUsage when
> track_io_timing is off, so are fields like
> pgBufferUsage.shared_blks_hit (i.e. those that don't have a
> time/duration component) officially okay to rely on across the board?
> It looks like they are okay to rely on (even when track_io_timing is
> off), but it would be nice to put that on a formal footing, if it
> isn't already.

Right, that commit did this, plus the local variant:

@@ -680,6 +682,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber
forkNum, BlockNumber blockNum,
else
PinBuffer_Locked(bufHdr);   /* pin
for first time */

+   pgBufferUsage.shared_blks_hit++;
+
return true;
}

I should perhaps have committed those changes separately with their
own explanation, since it was really an oversight in commit
2f27f8c5114 that this type of hit wasn't counted (as noted by Julien
in review of the WAL prefetcher).  I doubt anyone else has discovered
that function, which has no caller in PG14.

As for your general question, I think you must be right.  From a quick
rummage around in the commit log, it does appear that commit cddca5ec
(2009), which introduced pgBufferUsage, always bumped the counters
unconditionally.  It predated track_io_timing by years (40b9b957694
(2012)), and long before that the Berkeley code already had a simpler
thing along those lines (ReadBufferCount, BufferHitCount etc).  I
didn't look up the discussion, but I wonder if the reason commit
9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of
measuring level changes in pgBufferUsage is that pgBufferUsage didn't
have a dirty count until commit 2254367435f (2012), and once the
authors had decided they'd need a new special counter for that, they
continued down that path and added the others too?




Re: More problems with VacuumPageHit style global variables

2022-04-20 Thread Peter Geoghegan
On Wed, Apr 20, 2022 at 7:50 PM Thomas Munro  wrote:
> As for your general question, I think you must be right.  From a quick
> rummage around in the commit log, it does appear that commit cddca5ec
> (2009), which introduced pgBufferUsage, always bumped the counters
> unconditionally.  It predated track_io_timing by years (40b9b957694
> (2012)), and long before that the Berkeley code already had a simpler
> thing along those lines (ReadBufferCount, BufferHitCount etc).  I
> didn't look up the discussion, but I wonder if the reason commit
> 9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of
> measuring level changes in pgBufferUsage is that pgBufferUsage didn't
> have a dirty count until commit 2254367435f (2012), and once the
> authors had decided they'd need a new special counter for that, they
> continued down that path and added the others too?

I knew about pgBufferUsage, and I knew about
VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make
the very obvious connection between the two until today. I am probably
not the only one.

-- 
Peter Geoghegan




RE: Data is copied twice when specifying both child and parent table in publication

2022-04-20 Thread wangw.f...@fujitsu.com
On Tue, Apr 19, 2022 4:53 PM Shi, Yu/侍 雨  wrote:
> On Tue, Apr 19, 2022 3:05 PM houzj.f...@fujitsu.com 
> wrote:
> >
> > > -Original Message-
> > > From: Wang, Wei/王 威 
> > On Thursday, April 7, 2022 11:08 AM
> > >
> > > On Thur, Mar 10, 2021 at 10:08 AM houzj.f...@fujitsu.com wrote:
> > > > Hi,
> > > >
> > > > When reviewing some logical replication related features. I noticed
> another
> > > > possible problem if the subscriber subscribes multiple publications 
> > > > which
> > > > publish parent and child table.
> > > >
> > > > For example:
> > > >
> > > > pub
> > > > create table t (a int, b int, c int) partition by range (a);
> > > > create table t_1 partition of t for values from (1) to (10);
> > > >
> > > > create publication pub1 for table t
> > > >   with (PUBLISH_VIA_PARTITION_ROOT);
> > > > create publication pub2 for table t_1
> > > >   with (PUBLISH_VIA_PARTITION_ROOT);
> > > >
> > > > sub
> > > >  prepare table t and t_1
> > > > CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres'
> > > > PUBLICATION pub1, pub2;
> > > >
> > > > select * from pg_subscription_rel ;
> > > >  srsubid | srrelid | srsubstate | srsublsn
> > > > -+-++---
> > > >16391 |   16385(t) | r  | 0/150D100
> > > >16391 |   16388(t_1) | r  | 0/150D138
> > > >
> > > > If subscribe two publications one of them publish parent table with
> > > > (pubviaroot=true) and another publish child table. Both the parent table
> and
> > > > child table will exist in pg_subscription_rel which also means we will 
> > > > do
> > > > initial copy for both tables.
> > > >
> > > > But after initial copy, we only publish change with the schema of the
> parent
> > > > table(t). It looks a bit inconsistent.
> > > >
> > > > Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think
> > > the
> > > > expected behavior could be we only store the top most parent(table t) in
> > > > pg_subscription_rel and do initial copy for it if pubviaroot is on. I 
> > > > haven't
> > > > thought about how to fix this and will investigate this later.
> > > Hi,
> > > I try to fix this bug. Attach the patch.
> > >
> > > The current HEAD get table list for one publication by invoking function
> > > pg_get_publication_tables. If multiple publications are subscribed, then 
> > > this
> > > function is invoked multiple times. So option
> PUBLISH_VIA_PARTITION_ROOT
> > > works
> > > independently on every publication, I think it does not work correctly on
> > > different publications of the same subscription.
> > >
> > > So I fix this bug by the following two steps:
> > > First step,
> > > I get oids of subscribed tables by publication list. Then for tables with 
> > > the
> > > same topmost root table, I filter them base on the option
> > > PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids).
> > > After filtering, I get the final oid list.
> > > Second step,
> > > I get the required informations(nspname and relname) base on the oid list
> of
> > > first step.
> >
> > Thanks for updating the patch.
> > I confirmed that the bug is fixed by this patch.
> >
> > One suggestion is that can we simplify the code by moving the logic of
> checking
> > the ancestor into the SQL ?. For example, we could filter the outpout of
> > pg_publication_tables by adding A WHERE clause which checks whether the
> table
> > is a partition and if its ancestor is also in the output. I think we can 
> > also
> > filter the needless partition in this approach.
> >
> 
> I agreed with you and I tried to fix this problem in a simpler way. What we 
> want
> is to exclude the partitioned table whose ancestor is also need to be
> replicated, so how about implementing that by using the following SQL when
> getting the table list from publisher?
> 
> SELECT DISTINCT ns.nspname, c.relname
> FROM pg_catalog.pg_publication_tables t
> JOIN pg_catalog.pg_namespace ns ON ns.nspname = t.schemaname
> JOIN pg_catalog.pg_class c ON c.relname = t.tablename AND c.relnamespace =
> ns.oid
> WHERE t.pubname IN ('p0','p2')
> AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1 FROM
> pg_partition_ancestors(c.oid)
> WHERE relid IN ( SELECT DISTINCT (schemaname||'.'||tablename)::regclass::oid
> FROM pg_catalog.pg_publication_tables t
> WHERE t.pubname IN ('p0','p2') ) AND relid != c.oid));
> 
> Please find the attached patch which used this approach, I also merged the 
> test
> in Wang's patch into it.
Thanks for your review and patch.

I think the approach of v2 is better than v1. It does not increase the query.
Only move the test cases from 100_bugs.pl to 013_partition.pl and simplify it.
Attach the new patch.

Regards,
Wang wei


v3-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch
Description:  v3-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch


Re: DataRow message for Integer(int4) returns result as text?

2022-04-20 Thread Tom Lane
Tyler Brock  writes:
> I think this makes sense but I wanted to get confirmation:
> I created a table with a column having the type int4 (integer). When I
> insert a row with a number into that column and get it back out I've
> observed a discrepancy:

> The DataRow message has the field encoded as an ASCII ‘7’ with a column
> length of 1 despite the RowDescription having a column length 4. I assume
> that this is because it’s a simple query (Q) and therefore the format code
> for all columns is 0 (for text format).

If you mean the "data type size" (typlen) field of RowDescription, that
is arguably completely irrelevant; it's there for historical reasons,
I think.  The contents of a DataRow field will either be a textual
conversion of the value or the on-the-wire binary representation defined
by the type's typsend routine.  In either case, the actual length of
the value as it appears in DataRow is given right there in the DataRow
message.  And in either case, the typlen value doesn't necessarily have
anything to do with the length of the DataRow representation.  typlen
does happen to match up with the length that'd appear in DataRow for
simple integral types sent in binary format ... but for other cases,
not so much.

regards, tom lane




Re: Skipping schema changes in publication

2022-04-20 Thread vignesh C
On Mon, Apr 18, 2022 at 12:32 PM Amit Kapila  wrote:
>
> On Fri, Apr 15, 2022 at 1:26 AM Euler Taveira  wrote:
> >
> > On Thu, Apr 14, 2022, at 10:47 AM, Peter Eisentraut wrote:
> >
> > On 12.04.22 08:23, vignesh C wrote:
> > > I have also included the implementation for skipping a few tables from
> > > all tables publication, the 0002 patch has the implementation for the
> > > same.
> > > This feature is helpful for use cases where the user wants to
> > > subscribe to all the changes except for the changes present in a few
> > > tables.
> > > Ex:
> > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
> > > OR
> > > ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;
> >
> > We have already allocated the "skip" terminology for skipping
> > transactions, which is a dynamic run-time action.  We are also using the
> > term "skip" elsewhere to skip locked rows, which is similarly a run-time
> > action.  I think it would be confusing to use the term SKIP for DDL
> > construction.
> >
> > I didn't like the SKIP choice too. We already have EXCEPT for IMPORT FOREIGN
> > SCHEMA and if I were to suggest a keyword, it would be EXCEPT.
> >
>
> +1 for EXCEPT.

Updated patch by changing the syntax to use EXCEPT instead of SKIP.

Regards,
Vignesh
From 289b5acfc33f70f488f45e2cb55714d20097ac4c Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Wed, 20 Apr 2022 11:19:50 +0530
Subject: [PATCH v2] Skip publishing the tables specified in EXCEPT TABLE.

A new option "EXCEPT TABLE" in Create/Alter Publication allows
one or more tables to be excluded, publisher will exclude sending the data
of the tables present in the except table to the subscriber.

The new syntax allows specifying schemas. For example:
CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD EXCEPT TABLE t1,t2;

A new column prexcept is added to table "pg_publication_rel", to maintain
the relations that the user wants to exclude publishing through the publication.
Modified the output plugin (pgoutput) to exclude publishing the changes if the
relation is part of except table publication.

Updates pg_dump to identify and dump except table publications. Updates the \d
family of commands to display except table publications and \dRp+ variant will
now display associated except tables if any.

Bump catalog version.
---
 doc/src/sgml/catalogs.sgml|   9 ++
 doc/src/sgml/logical-replication.sgml |   8 +-
 doc/src/sgml/ref/alter_publication.sgml   |  14 ++-
 doc/src/sgml/ref/create_publication.sgml  |  29 -
 doc/src/sgml/ref/psql-ref.sgml|   5 +-
 src/backend/catalog/pg_publication.c  |  36 --
 src/backend/commands/publicationcmds.c| 106 +++---
 src/backend/commands/tablecmds.c  |   4 +-
 src/backend/parser/gram.y | 102 +++--
 src/backend/replication/pgoutput/pgoutput.c   |  25 ++---
 src/backend/utils/cache/relcache.c|  17 ++-
 src/bin/pg_dump/pg_dump.c |  35 --
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/pg_dump/pg_dump_sort.c|   7 ++
 src/bin/pg_dump/t/002_pg_dump.pl  |  23 
 src/bin/psql/describe.c   |  25 -
 src/bin/psql/tab-complete.c   |  15 ++-
 src/include/catalog/pg_publication.h  |   7 +-
 src/include/catalog/pg_publication_rel.h  |   1 +
 src/include/commands/publicationcmds.h|   4 +-
 src/include/nodes/parsenodes.h|   2 +
 src/test/regress/expected/publication.out |  81 -
 src/test/regress/sql/publication.sql  |  40 ++-
 .../t/033_rep_changes_except_table.pl |  97 
 24 files changed, 580 insertions(+), 113 deletions(-)
 create mode 100644 src/test/subscription/t/033_rep_changes_except_table.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..78e8c22a59 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6426,6 +6426,15 @@ SCRAM-SHA-256$:&l
   if there is no publication qualifying condition.
  
 
+ 
+  
+  prexcept bool
+  
+  
+   True if the table must be excluded
+  
+ 
+
  
   
prattrs int2vector
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a0f9cecd01..36e943d305 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1160,10 +1160,10 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
   
 
   
-   To add tables to a publication, the user must have ownership rights on the
-   table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables or all tables in
-   schema automatically, the user must be a superuser.
+   To add tables or exclude tables 

Re: [PATCH] Add native windows on arm64 support

2022-04-20 Thread Michael Paquier
On Wed, Apr 20, 2022 at 10:43:06AM +0100, Niyas Sait wrote:
>> This issue is still lying around, and you may have been lucky.  Would
>> there be any issues to remove this change to get a basic support in?
>> As mentioned upthread, there is a long history of Postgres with ASLR.
> 
> MSVC linker doesn't allow non-random base addresses for Arm64 platforms.
> It is needed for basic support.

Why does it not allow that?  Is that just a limitation of the
compiler?  If yes, what is the error happening?  This question is not
exactly answered yet as of this thread.  I may be missing a reference
about that in the upstream docs, but I see nowhere an explanation
about the reason and the why.  That's also one of the first questions
from Thomas upthread.
--
Michael


signature.asc
Description: PGP signature


Re: Postgres perl module namespace

2022-04-20 Thread Michael Paquier
On Wed, Apr 20, 2022 at 03:56:17PM -0400, Andrew Dunstan wrote:
> Basically I propose just to remove any mention of the Testlib items and
> get_free_port from the export and alias lists for versions where they
> are absent. If backpatchers need a function they can backport it if
> necessary.

Agreed.  I am fine to stick to that (I may have done that only once or
twice in the past years, so that does not happen a lot either IMO).
The patch in itself looks like an improvement in the right direction,
so +1 from me.
--
Michael


signature.asc
Description: PGP signature


Re: Fix NULL pointer reference in _outPathTarget()

2022-04-20 Thread Richard Guo
On Thu, Apr 21, 2022 at 12:02 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 18.04.22 09:35, Richard Guo wrote:
> > The array sortgrouprefs[] inside PathTarget might be NULL if we have not
> > identified sort/group columns in this tlist. In that case we would have
> > a NULL pointer reference in _outPathTarget() when trying to print
> > sortgrouprefs[] with WRITE_INDEX_ARRAY as we are using the length of
> > PathTarget->exprs as its array length.
>
> Do you have a test case that triggers this issue?
>

I don't have a test case. :(  I triggered this issue while debugging
with gdb and I was printing a certain 'pathlist' with nodeToString().

If it helps, here is the backtrace:

#0  in _outPathTarget (str=0x7fff683d7e50, node=0x56011e5cece0) at
outfuncs.c:2672
#1  in outNode (str=0x7fff683d7e50, obj=0x56011e5cece0) at outfuncs.c:4490
#2  in _outPathInfo (str=0x7fff683d7e50, node=0x56011e5f3408) at
outfuncs.c:1922
#3  in _outPath (str=0x7fff683d7e50, node=0x56011e5f3408) at outfuncs.c:1957
#4  in outNode (str=0x7fff683d7e50, obj=0x56011e5f3408) at outfuncs.c:4358
#5  in _outProjectionPath (str=0x7fff683d7e50, node=0x56011e5f3890) at
outfuncs.c:2154
#6  in outNode (str=0x7fff683d7e50, obj=0x56011e5f3890) at outfuncs.c:4409
#7  in _outAggPath (str=0x7fff683d7e50, node=0x56011e5f4550) at
outfuncs.c:2224
#8  in outNode (str=0x7fff683d7e50, obj=0x56011e5f4550) at outfuncs.c:4427
#9  in _outGatherPath (str=0x7fff683d7e50, node=0x56011e5f45e8) at
outfuncs.c:2142
#10 in outNode (str=0x7fff683d7e50, obj=0x56011e5f45e8) at outfuncs.c:4406
#11 in _outList (str=0x7fff683d7e50, node=0x56011e5f4680) at outfuncs.c:227
#12 in outNode (str=0x7fff683d7e50, obj=0x56011e5f4680) at outfuncs.c:4028
#13 in nodeToString (obj=0x56011e5f4680) at outfuncs.c:4782


Thanks
Richard


Re: Add --{no-,}bypassrls flags to createuser

2022-04-20 Thread Michael Paquier
On Tue, Apr 19, 2022 at 12:13:51PM -0400, Robert Haas wrote:
> On Mon, Apr 18, 2022 at 9:50 PM Kyotaro Horiguchi
>  wrote:
>> Hmm.. So, "-r/--role" and "-m/--member(ship)" is the (least worse) way
>> to go?  Or we can give up adding -m for the reason of being hard to
>> name it..
> 
> Hmm, yeah, I hadn't quite realized what the problem was when I wrote
> that. I honestly don't know what to do about that. Renaming the
> existing option is not great, but having the syntax diverge between
> SQL and CLI is not great either. Giving up is also not great. Not sure
> what is best.

Changing one existing option to mean something entirely different
should be avoided, as this could lead to silent breakages.  As the
origin of the problem is that the option --role means "IN ROLE" in the
SQL grammar, we could keep around --role for compatibility while
marking it deprecated, and add two new options whose names would be
more consistent with each other.  One choice could be --role-name and
--in-role-name, where --in-role-name maps to the older --role, just to
give an idea.
--
Michael


signature.asc
Description: PGP signature


Re: typos

2022-04-20 Thread Michael Paquier
On Wed, Apr 20, 2022 at 11:32:08PM +0200, Alvaro Herrera wrote:
> So the attached.
> 
> --- a/doc/src/sgml/install-windows.sgml
> +++ b/doc/src/sgml/install-windows.sgml
> @@ -307,9 +307,9 @@ $ENV{MSBFLAGS}="/m";
>  
>  
>  
> - ZSTD
> + Zstd
>   
> -  Required for supporting ZSTD compression
> +  Required for supporting Zstd compression

Looking at the zstd project itself for reference or just wiki-sensei,
I don't think that this is correct:
https://github.com/facebook/zstd
https://en.wikipedia.org/wiki/Zstd

Their README uses "zstd" in lower-case, while "Zstd" (first letter
upper-case) is used at the beginning of a sentence.
--
Michael


signature.asc
Description: PGP signature


How to debug JIT-ed code in PostgreSQL using GDB

2022-04-20 Thread Japin Li


Hi, hackers

I try to use gdb to debug the jit-ed code, however, there isn't
much useful information.  I tried the way from [1], however, it
doesn't work for me (llvm-10 in my environment).

How can I do this debugging?  Any suggestions?  Thanks in advance!


[1] https://releases.llvm.org/8.0.1/docs/DebuggingJITedCode.html

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada
On Thu, Apr 21, 2022 at 11:19 AM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
> > >
> >
> > > I think it would
> > > be then better to have it in the same place in HEAD as well?
> >
> > As far as I can see in the v17 patch, which is for HEAD, we don't add
> > a variable to LogicalDecodingContext, but did you refer to another
> > patch?
> >
>
> No, I thought it is better to follow the same approach in HEAD as
> well. Do you see any problem with it?

No, that makes sense to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Two small issues related to table_relation_copy_for_cluster() and CTAS with no data.

2022-04-20 Thread Paul Guo
Hello hackers,

While reading the latest master branch code, I found something that we
may be able to improve.

1. The am table_relation_copy_for_cluster() interface.

  static inline void
  table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
  Relation OldIndex,
  bool use_sort,
  TransactionId OldestXmin,
  TransactionId *xid_cutoff,
  MultiXactId *multi_cutoff,
  double *num_tuples,
  double *tups_vacuumed,
  double *tups_recently_dead)

- Should add a line for parameter num_tuples below "Output parameters
" in comment
- Look at the caller code, i.e. copy_table_data(). It does initialize
*num_tuples,
   *tups_vacuumed and *tups_recently_dead at first. This does not seem
to be a good
   API design or implementation. We'd better let the am api return the
values without
   initializing from callers, right?

2. For CTAS (create table as) with no data. It seems that we won't run
into intorel_receive().
intorel_startup() could be run into for "create table as t1
execute with no data". So it looks
like we do not need to judge for into->skipData in
intorel_receive(). If we really want to
check into->skipData we could add an assert check there or if I
missed some code paths
in which we could be run into the code branch, we could instead
call below code in
intorel_receive() to stop early, right?

if (myState->into->skipData)
return false;

Regards,
Paul




Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-20 Thread bu...@sohu.com
> > for now fuction cost_subqueryscan always using *total* rows even parallel
> > path. like this:
> >
> > Gather (rows=3)
> >   Workers Planned: 2
> >   ->  Subquery Scan  (rows=3) -- *total* rows, should be equal subpath
> > ->  Parallel Seq Scan  (rows=1)
>  
> OK, that's bad.
>  
> > Maybe the codes:
> >
> > /* Mark the path with the correct row estimate */
> > if (param_info)
> > path->path.rows = param_info->ppi_rows;
> > else
> > path->path.rows = baserel->rows;
> >
> > should change to:
> >
> > /* Mark the path with the correct row estimate */
> > if (path->path.parallel_workers > 0)
> > path->path.rows = path->subpath->rows;
> > else if (param_info)
> > path->path.rows = param_info->ppi_rows;
> > else
> > path->path.rows = baserel->rows;
>  
> Suppose parallelism is not in use and that param_info is NULL. Then,
> is path->subpath->rows guaranteed to be equal to baserel->rows? If
> yes, then we don't need to a three-part if statement as you propose
> here and can just change the "else" clause to say path->path.rows =
> path->subpath->rows. If no, then your change gives the wrong answer.
I checked some regress test, Sometimes subquery scan have filter, 
so path->subpath->row guaranteed *not* to be equal to baserel->rows.
If the first patch is false, I don't known how to fix this,
looks like need someone's help.



Re: postgres_fdw: batch inserts vs. before row triggers

2022-04-20 Thread Etsuro Fujita
Hi,

On Tue, Apr 19, 2022 at 9:00 PM Tomas Vondra
 wrote:
> On 4/19/22 11:16, Etsuro Fujita wrote:
> > On Sun, Apr 17, 2022 at 6:20 PM Etsuro Fujita  
> > wrote:
> >> So I think we should disable batch insert in such cases, just as we
> >> disable multi insert when there are any before row triggers on the
> >> target (local) tables/partitions in copyfrom.c.  Attached is a patch
> >> for that.
> >
> > If there are no objections from Tomas or anyone else, I'll commit the patch.

> +1, I think it's a bug to do batch insert in this case.

Pushed and back-patched to v14, after tweaking a comment a little bit.

Thanks!

Best regards,
Etsuro Fujita