Re: Set query_id for query contained in utility statement

2024-10-27 Thread Michael Paquier
On Thu, Oct 24, 2024 at 11:28:55AM +0900, Michael Paquier wrote:
> Attached is the remaining piece, for DECLARE and CTAS.  The
> JumbleQuery() calls in ExecCreateTableAs() and ExplainOneUtility() for
> CTAS queries are twins, showing the inner queries of CTAS
> consistently.  DECLARE is covered by one call in ExplainOneUtility()
> and one in PerformCursorOpen().

I've come back to it with a fresher mind, and it still looked OK on a
second look, so applied after some slight tweaks.  It also means that
we should be done here, so I am marking the CF entry as committed.
--
Michael


signature.asc
Description: PGP signature


Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Masahiko Sawada
On Sun, Oct 27, 2024 at 3:33 AM Stepan Neretin  wrote:
>
>
>
>>
>> IIUC after an immediate shutdown all pgstat entries are wiped out so
>> the server doesn't have any pgstat entries for databases at this
>> point. And since we don't run autovacuum on databases that have no
>> pg_stat entries, no autovacuum worker worked on the 'postgres'
>> database. Please try executing any query (e.g. 'select 1') on the
>> 'postgres' database after the restart, which creates a pgstat entry
>> for the database.
>>
>> > sleep(5);
>>
>> While the test script sleeps for 5 seconds, the server restarts after
>> a crash. So even if the assertion failure happens, the test would
>> appear to be successful. I think you can set 'restart_after_crash =
>> off' and execute another query using safe_psql() after the sleep. That
>> way, the test ends up with safe_psql() failure because the database
>> server is not running.
>
>
> Hi, thank you for your suggestions!  But they did not help me. Autovacuum 
> does not want to start :(
>

I've attached the TAP test that I meant.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
use strict;
use warnings;
use threads;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::RecursiveCopy;
use PostgreSQL::Test::Utils;
use Test::More;
use Data::Dumper;

my $node = PostgreSQL::Test::Cluster->new('main');

# Create a data directory with initdb
$node->init;

$node->append_conf(
'postgresql.conf', qq[
autovacuum = on
track_counts=on
autovacuum_naptime = '1s'
autovacuum_max_workers = 3
autovacuum_vacuum_threshold = 1
log_statement = 'all'
restart_after_crash = off
]);

# Start the PostgreSQL server
$node->start;

my $psql1 = $node->interactive_psql('postgres');
$psql1->query("create temp table test (a int primary key);");
sleep(3);

$node->stop('immediate');
$node->start();
$node->safe_psql('postgres', 'select 1');

sleep(5);
$node->safe_psql('postgres', 'select 1');

ok(1);
done_testing();


Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Michael Paquier
On Sun, Oct 27, 2024 at 05:19:45PM -0700, Masahiko Sawada wrote:
> my $psql1 = $node->interactive_psql('postgres');
> $psql1->query("create temp table test (a int primary key);");
> sleep(3);
> 
> $node->stop('immediate');
> $node->start();
> $node->safe_psql('postgres', 'select 1');
> 
> sleep(5);
> $node->safe_psql('postgres', 'select 1');
> 
> ok(1);
> done_testing();

The assertion failure happens in an autovacuum worker.  So if you are
looking for a test that can be integrated in the tree, you could get
some inspiration from 006_signal_autovacuum.pl and rely on an
injection point wait with the existing autovacuum-worker-start.  My
2c, as it looks easy enough to avoid the hardcoded waits.
--
Michael


signature.asc
Description: PGP signature


Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-27 Thread Michael Paquier
On Fri, Oct 25, 2024 at 09:33:12AM +0900, Michael Paquier wrote:
> I am going to remove the first permutation anyway to stabilize the CI
> reports.  But it feels like we have a different problem here, and I am
> not sure what.

A few days later, I have looked at the full set of CF bot jobs and
9f00edc22888 has stabilized the CI.  We went down from 20-ish failures
in the whole set of patches to 0.

I still don't have a clear idea about why this is limited to FreeBSD
13 and only for this permutation.  The buildfarm did not report
anything, except if I've missed something, of course.
--
Michael


signature.asc
Description: PGP signature


Re: Inconsistent RestrictInfo serial numbers

2024-10-27 Thread Richard Guo
On Fri, Oct 11, 2024 at 3:02 PM Ashutosh Bapat
 wrote:
> 1. What strikes me as odd in your patch is that the last_rinfo_serial
> is reset so far away from the new clause that will be created with the
> next last_rinfo_serial OR the clause whose clones are being created.
> It either indicates another site where we are missing (possibly
> future) to restore rinfo_serial in a clone OR reset of
> last_serial_rinfo needs to happen somewhere else. Why isn't resetting
> last_rinfo_serial in deconstruct_distribute_oj_quals() enough?

Well, the resetting of the last_rinfo_serial counter in
deconstruct_distribute_oj_quals is to ensure that multiple clones of
the same qual receive the same serial number.  As the comment there
explains, it works only if we ensure that we are not changing the qual
list in any way that'd affect the number of RestrictInfos built from
it.

However, in b262ad440, we did not get this memo promptly.  As I
explained earlier, the has_clone version of qual 't1.a is null'
would be reduced to constant-FALSE, while the is_clone version would
not.  That is to say, for the has_clone version, we'd build three
RestrictInfos, whereas for the is_clone version, we'd build only two.
This discrepancy in numbers is the root cause of the issue.

So, resetting last_rinfo_serial in deconstruct_distribute_oj_quals()
is not enough.

> 2. This change would also prevent add_base_clause_to_rel() and
> add_join_clause_to_rels() from being used anywhere outside the context
> of deconstruct_distribute_oj_quals() since these functions would reset
> last_rinfo_serial when it's expected to be incremented. Moreover
> another make_restrictinfo() call somewhere in the minions of
> distribute_qual_to_rels() or distribute_quals_to_rels() would cause a
> similar bug.

I don't see how this change would prevent add_base_clause_to_rel and
add_join_clause_to_rels from being used in other context than
deconstruct_distribute_oj_quals.  Could you please provide an example?

> 3. Do we need to reset last_serial_rinfo everywhere we reset
> rinfo_serial e.g. create_join_clause()?

Hmm, I don't think that's necessary.  As long as we'd build the same
number of RestrictInfos from the qual list for different versions,
we're good.

> I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do
> anything wrong. It's effectively copying rinfo_serial of original
> clause into the derived clause. I would have liked it better if it
> would have used the same method as create_join_clause(). Some other
> commti failed to notice the some minion of
> distribute_quals_to_rels()->distribute_qual_to_rels() may increment
> last_rinfo_serial while creating an alternate RestrictInfo for the
> qual passed to distribute_qual_to_rels(). I think a robust fix is to
> reset last_rinfo_serial in distribute_quals_to_rels() for every qual
> and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial +
> list_length(quals)) before resetting last_rinfo_serial in
> deconstruct_distribute_oj_quals() to catch any future digressions.

Would you mind proposing a patch for this method so we can discuss the
details?

Thanks
Richard




Re: Forbid to DROP temp tables of other sessions

2024-10-27 Thread Michael Paquier
On Fri, Oct 25, 2024 at 02:01:23PM -0400, Tom Lane wrote:
> If autovacuum can do it, I don't see a reason to prevent superusers
> from doing it manually.

Being able to do a DROP of a temporary table in a controlled way can
also be very handy when working on a cluster with a corrupted catalog
state, so it is a feature if superusers are able to do that.  I'm
pretty sure that we have this argument once every few years on
pgsql-hackers..
--
Michael


signature.asc
Description: PGP signature


Re: Pgoutput not capturing the generated columns

2024-10-27 Thread Peter Smith
Hi, here are my review comments for patch v43-0001.

==

1. Missing docs update?

The CREATE PUBLICATION docs currently says:
When a column list is specified, only the named columns are
replicated. If no column list is specified, all columns of the table
are replicated through this publication, including any columns added
later.

~

For this patch, should that be updated to say "... all columns (except
generated columns) of the table are replicated..."

==
src/backend/replication/logical/proto.c

2.
+static bool
+should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+{
+ if (att->attisdropped)
+ return false;
+
+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ return false;
+
+ if (!column_in_column_list(att->attnum, columns))
+ return false;
+
+ return true;
+}

Here, I wanted to suggest that the whole "Skip publishing generated
columns" if-part is unnecessary because the next check
(!column_in_column_list) is going to return false for the same
scenario anyhow.

But, unfortunately, the "column_in_column_list" function has some
special NULL handling logic in it; this means none of this code is
quite what it seems to be (e.g. the function name
column_in_column_list is somewhat misleading)

IMO it would be better to change the column_in_column_list signature
-- add another boolean param to say if a NULL column list is allowed
or not. That will remove any subtle behaviour and then you can remove
the "if (att->attgenerated && !columns)" part.

==
src/backend/replication/pgoutput/pgoutput.c

3. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (att->atttypid < FirstGenbkiObjectId)
  continue;

+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ continue;
+
  /* Skip this attribute if it's not present in the column list */
  if (columns != NULL && !bms_is_member(att->attnum, columns))
  continue;
~

Most of that code above looks to be doing the very same thing as the
new 'should_publish_column' in proto.c. Won't it be better to expose
the other function and share the common logic?

~~~

4. pgoutput_column_list_init

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

+ if (att->attgenerated)
+ {
+ if (bms_is_member(att->attnum, cols))
+ gencolpresent = true;
+
+ continue;
+ }
+
  nliveatts++;
  }

  /*
- * If column list includes all the columns of the table,
- * set it to NULL.
+ * If column list includes all the columns of the table
+ * and there are no generated columns, set it to NULL.
  */
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts && !gencolpresent)
  {

Something seems not quite right (or maybe redundant) with this logic.
For example, because you unconditionally 'continue' for generated
columns, then AFAICT it is just not possible for bms_num_members(cols)
== nliveatts and at the same time 'gencolpresent' to be true. So you
could've just Asserted (!gencolpresent) instead of checking it in the
condition and mentioning it in the comment.

==
src/test/regress/expected/publication.out

5.
--- error: generated column "d" can't be in list
+-- ok: generated column "d" can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
-ERROR:  cannot use generated column "d" in publication column list

By allowing the above to work without giving ERROR, I think you've
broken many subsequent test expected results. e.g. I don't trust these
"expected" results anymore because I didn't think these next test
errors should have been affected, right?

 -- error: system attributes "ctid" not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
-ERROR:  cannot use system column "ctid" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

~

 -- error: duplicates not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
-ERROR:  duplicate column "a" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

probably more like this...

==
src/test/subscription/t/031_column_list.pl

6.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE test_gen (a int PRIMARY KEY, b int);
+));
+
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE SUBSCRIPTION sub_gen CONNECTION '$publisher_connstr'
PUBLICATION pub_gen;
+));

Should combine these.

~~~

7.
+$node_publisher->wait_for_catchup('sub_gen');
+
+is( $node_subscriber->safe_psql(
+ 'postgres', "SELECT * FROM test_gen ORDER BY a"),
+ qq(1|2),
+ 'replication with generated columns in column list');
+

But, this is only testing normal re

RE: Pgoutput not capturing the generated columns

2024-10-27 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

More comments for v43-0001.

01. publication.out and publication.sql

I think your fix is not sufficient, even if it pass tests. 

```
-- error: system attributes "ctid" not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
-ERROR:  cannot use system column "ctid" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication 
"testpub_fortable"
 ALTER PUBLICATION testpub_fortable SET TABLE testpub_tbl1 (id, ctid);
 ERROR:  cannot use system column "ctid" in publication column list
 -- error: duplicates not allowed in column list
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
-ERROR:  duplicate column "a" in publication column list
+ERROR:  relation "testpub_tbl5" is already member of publication 
"testpub_fortable
```

The error message is not match with the comment. The comment said that the table
has already been added in the publication. I think the first line [1] succeeded 
by your change
and testpub_tbl5 became a member at that time then upcoming ALTER statements 
failed
by the duplicate registration.

```
-- ok
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c);
+ERROR:  relation "testpub_tbl5" is already member of publication 
"testpub_fortable"
```

You said OK but same error happened.

```
ALTER TABLE testpub_tbl5 DROP COLUMN c; -- no dice
-ERROR:  cannot drop column c of table testpub_tbl5 because other objects 
depend on it
-DETAIL:  publication of table testpub_tbl5 in publication testpub_fortable 
depends on column c of table testpub_tbl5
-HINT:  Use DROP ... CASCADE to drop the dependent objects too.
```

This statement should be failed because c was included in the column.
However, it succeeded because previous ALTER PUBLICATION was failed.
Upcoming SQLs wrongly thawed ERRORs because of this.

Please look at all of differences before doing copy-and-paste.

02. 031_column_list.pl

```
-# TEST: Generated and dropped columns are not considered for the column list.
+# TEST: Dropped columns are not considered for the column list.
 # So, the publication having a column list except for those columns and a
 # publication without any column (aka all columns as part of the columns
 # list) are considered to have the same column list.
```

Based on the comment, this case does not test the behavior of generated columns
anymore. So, I felt column 'd' could be removed from the case.

03. 031_column_list.pl

Can we test that generated columns won't be replaced if it does not included in
the column list?

[1]:
```
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Pgoutput not capturing the generated columns

2024-10-27 Thread Amit Kapila
On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
>
> 7.
> +$node_publisher->wait_for_catchup('sub_gen');
> +
> +is( $node_subscriber->safe_psql(
> + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> + qq(1|2),
> + 'replication with generated columns in column list');
> +
>
> But, this is only testing normal replication. You should also include
> some initial table data so you can test that the initial table
> synchronization works too. Otherwise, I think current this patch has
> no proof that the initial 'copy_data' even works at all.
>

Per my tests, the initial copy doesn't work with 0001 alone. It needs
changes in table sync.c from the 0002 patch. Now, we can commit 0001
after fixing comments and mentioning in the commit message that this
patch supports only the replication of generated columns when
specified in the column list. The initial sync and replication of
generated columns when not specified in the column list will be
supported in future commits. OTOH, if the change to make table sync
work is simple, we can even combine that change.

-- 
With Regards,
Amit Kapila.




Re: Pgoutput not capturing the generated columns

2024-10-27 Thread Peter Smith
On Mon, Oct 28, 2024 at 4:34 PM Amit Kapila  wrote:
>
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
> >
> > Hi, here are my review comments for patch v43-0001.
> >
> > ==
> > src/backend/replication/logical/proto.c
> >
> > 2.
> > +static bool
> > +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> > +{
> > + if (att->attisdropped)
> > + return false;
> > +
> > + /*
> > + * Skip publishing generated columns if they are not included in the
> > + * column list.
> > + */
> > + if (att->attgenerated && !columns)
> > + return false;
> > +
> > + if (!column_in_column_list(att->attnum, columns))
> > + return false;
> > +
> > + return true;
> > +}
> >
> > Here, I wanted to suggest that the whole "Skip publishing generated
> > columns" if-part is unnecessary because the next check
> > (!column_in_column_list) is going to return false for the same
> > scenario anyhow.
> >
> > But, unfortunately, the "column_in_column_list" function has some
> > special NULL handling logic in it; this means none of this code is
> > quite what it seems to be (e.g. the function name
> > column_in_column_list is somewhat misleading)
> >
> > IMO it would be better to change the column_in_column_list signature
> > -- add another boolean param to say if a NULL column list is allowed
> > or not. That will remove any subtle behaviour and then you can remove
> > the "if (att->attgenerated && !columns)" part.
> >
>
> The NULL column list still means all columns, so changing the behavior
> as you are proposing doesn't make sense and would make the code
> difficult to understand.
>

My point was that the function 'column_in_column_list' would return
true even when there is no publication column list at all, so that
function name is misleading.

And, because in patch 0001 the generated columns only work when
specified via a column list it means now there is a difference
between:
- NULL (all columns specified in the column list) and
- NULL (no column list at all).

which seems strange and likely to cause confusion.

On closer inspection, this function 'column_in_column_list; is only
called from one place -- the new 'should_publish_column()'. I think
the function column_in_column_list should be thrown away and just
absorbed into the calling function 'should_publish_column'. Then the
misleading function name is eliminated, and the special NULL handling
can be commented on properly.

==
Regards,
Peter Smith.
Fujitsu Australia




Re: New "raw" COPY format

2024-10-27 Thread jian he
On Thu, Oct 24, 2024 at 2:30 PM Joel Jacobson  wrote:
>
> On Thu, Oct 24, 2024, at 03:54, Masahiko Sawada wrote:
> > I have one question:
> >
> > From the 0001 patch's commit message:
> >
> > No behavioral changes are intended; this is a pure refactoring to improve 
> > code
> > clarity and maintainability.
> >
> > Does the reorganization of the option validation done by this patch
> > also help make the 0002 patch simple or small?
>
> Thanks for the review. No, not much, except the changes necessary to
> ProcessCopyOptions for raw, without also refactoring it, makes it
> more complicated.
>
> > If not much, while it
> > makes sense to me that introducing the CopyFormat enum is required by
> > the 0002 patch, I think we can discuss the reorganization part
> > separately. And I'd suggest the patch organization would be:
> >
> > 0001: introduce CopyFormat and replace csv_mode and binary fields with it.
> > 0002: add new 'raw' format.
> > 0003: reorganize option validations.
> >


  /* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote ||
opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
  ereport(ERROR,
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */

maybe this has a code indentation issue.
since "if" and "opts_out" in the same column position.

It came to my mind,
change errmsg occurrence of  "BINARY mode", "CSV mode" to "binary
format",  "csv format" respectively.
I think "format" would be more accurate.
but the message seems invasive,
so i guess we need to use "mode".

overall v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
looks good to me.




Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread Dave Page
On Sun, 27 Oct 2024 at 10:17, 黄铎彦 <00...@fzu.edu.cn> wrote:

> I could hardly find detailed and complete tutorials on building the latest
> version with Visual Studio.
>

You can see how it’s done if you take a look at the actions in
https://github.com/dpage/winpgbuild. The postgresql-dev one is probably the
best example, as it only runs a single meson build, whilst the postgresql
action is a matrix build that has to handle both the old style msvc and
meson, and thus is much harder to read.

The repo also shows how to build (nearly) all the dependencies in a meson
compatible way, or you can just download the build artefacts if you don’t
want to do that yourself (see the bundle-deps action). You may not see the
artefacts if you’re not logged in to Github.

I did use meson to generate sln and vcxproj files in Command Prompt
> (attatchment: 00meson_cmd_to_gen_sln.txt). But after opening the sln and
> run menu command "build -> build solution", it failed. The contents of the
> Output Window in Visual Studio are in the attatchment 01build_log.txt.
>
> -原始邮件-
> *From: *"Bill Smith" 
> *Sent: *2024-10-27 15:07:39
> *To: *"Mark Hill" 
> *Cc: *"pgsql-hackers@lists.postgresql.org" <
> pgsql-hackers@lists.postgresql.org>
>
> *Subject: *Re: msvc directory missing in PostgreSQL 17.0
>
>
> Check out this:
> Meson 
> wiki.postgresql.org 
> 
> 
>
> There's a table showing the setup & build commands using the old way and
> the new way (meson).
>
> On Oct 21, 2024, at 4:12 PM, Mark Hill  wrote:
>
> Thanks Bill!   Do you have a sample meson command for building that you
> could share?
>
> Thanks, Mark
>
> *From:* Bill Smith 
> *Sent:* Friday, October 18, 2024 4:11 PM
> *To:* Mark Hill 
> *Cc:* pgsql-hackers@lists.postgresql.org
> *Subject:* Re: msvc directory missing in PostgreSQL 17.0
>
>
> *EXTERNAL*
>
>
>
> On Oct 18, 2024, at 4:05 PM, Mark Hill  wrote:
>
> I just downloaded the Postgresql 17.0 and the extracted directory
> postgresql-17.0/src/tools does not contain
>
> the “msvc” directory as it has since I’ve been building Postgres 12, 14,
> and 16.   How am I supposed to build
> PostgreSQL for Windows now?   I’m not sure what msvc_gendef.pl does.
> How do we build Postgres for Windows
> using Visual Studio?
>
>
> You’ll want to use meson to build PG.  See the docs here:
> 
> 17.7. Platform-Specific Notes
> 
> postgresql.org
> 
>
>
> Thanks, Mark
>
>
>


Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Stepan Neretin
> IIUC after an immediate shutdown all pgstat entries are wiped out so
> the server doesn't have any pgstat entries for databases at this
> point. And since we don't run autovacuum on databases that have no
> pg_stat entries, no autovacuum worker worked on the 'postgres'
> database. Please try executing any query (e.g. 'select 1') on the
> 'postgres' database after the restart, which creates a pgstat entry
> for the database.
>
> > sleep(5);
>
> While the test script sleeps for 5 seconds, the server restarts after
> a crash. So even if the assertion failure happens, the test would
> appear to be successful. I think you can set 'restart_after_crash =
> off' and execute another query using safe_psql() after the sleep. That
> way, the test ends up with safe_psql() failure because the database
> server is not running.
>

Hi, thank you for your suggestions!  But they did not help me. Autovacuum
does not want to start :(

```
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;


my $node = PostgreSQL::Test::Cluster->new('main');

$node->init;

$node->append_conf(
'postgresql.conf', qq[
autovacuum = on
autovacuum_naptime = 1s
autovacuum_max_workers = 1
restart_after_crash = off
]);


$node->start;

my $psql1 = $node->interactive_psql('postgres');
$psql1->query("create temp table test (a int primary key);");

$node->stop('immediate');
sleep(5);

$node->start;

sleep(3);

$node->restart;

my $psql2 = $node->interactive_psql('postgres');
$psql2->query('SELECT 1;');
$psql2->query('SELECT 1;');

my $regexp = qr/autovacuum/;
my $offset = 0;

$node->wait_for_log($regexp, $offset);

done_testing();

```

Best Regards, Stepan Neretin.


Re: Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread huangdy2...@qq.com
On Sun, 27 Oct 2024 at 16:38 GMT+8, Dave Page  wrote:


>You can see how it’s done if you take a look at the actions in

>https://github.com/dpage/winpgbuild. The postgresql-dev one is probably the

>best example, as it only runs a single meson build, whilst the postgresql

>action is a matrix build that has to handle both the old style msvc and

>meson, and thus is much harder to read.

>

>The repo also shows how to build (nearly) all the dependencies in a meson

>compatible way, or you can just download the build artefacts if you don’t

>want to do that yourself (see the bundle-deps action). You may not see the

>artefacts if you’re not logged in to Github.

That repo seems to be something like Github action. But I would like to 
generate sln and vcxproj files in my Windows PC, then open Visual Studio, 
compile and navigate.

(Just now I changed my email address)









Re: Statistics Import and Export

2024-10-27 Thread Alexander Lakhin

Hello Jeff and Corey,

26.10.2024 01:18, Jeff Davis wrote:

On Tue, 2024-09-17 at 05:02 -0400, Corey Huinker wrote:

I've taken most of Jeff's work, reincorporated it into roughly the
same patch structure as before, and am posting it now.

I have committed the import side of this patch series; that is, the
function calls that can load stats into an existing cluster without the
need to ANALYZE.

The pg_restore_*_stats() functions are designed such that pg_dump can
emit the calls. Some design choices of the functions worth noting:


Please look at the following seemingly atypical behavior of the new
functions:
CREATE TABLE test(id int);

SELECT pg_restore_attribute_stats(
  'relation', 'test'::regclass,
  'attname', 'id'::name,
  'inherited', false);

SELECT pg_restore_attribute_stats(
  'relation', 'test'::regclass,
  'attname', 'id'::name,
  'inherited', false
) FROM generate_series(1, 2);
ERROR:  XX000: tuple already updated by self
LOCATION:  simple_heap_update, heapam.c:4353

Or:
SELECT pg_clear_attribute_stats('test'::regclass, 'id'::name, false)
FROM generate_series(1, 2);
ERROR:  XX000: tuple already updated by self
LOCATION:  simple_heap_delete, heapam.c:3108

Best regards,
Alexander




Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread huangdy2...@qq.com
On Sun, 27 Oct 2024 at 17:38 GMT+8, 黄铎彦  wrote:


>That repo seems to be something like Github action. But I would like to 
>generate sln and vcxproj files in my Windows PC, then open Visual Studio, 
>compile and navigate.


I used `meson setup --backend vs` but got compile error with the sln file.

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-10-27 Thread jian he
On Mon, Oct 21, 2024 at 8:39 PM Fujii Masao  wrote:
> On 2024/10/21 18:30, Kirill Reshke wrote:
> > v4 no longer applies. It now conflicts with
> > e7834a1a251d4a28245377f383ff20a657ba8262.
> > Also, there were review comments.
> >
> > So, I decided to rebase.
>
> Thanks for the patch! Here are my review comments:
>
> I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
> and columns with errors. It's "unexpected" thing for columns to be silently
> replaced with NULL due to on_error=set_to_null. So, similar to 
> on_error=ignore,
> there should be NOTICE messages indicating which input records had columns
> set to NULL because of data type incompatibility. Without these messages,
> users might not realize that some columns were set to NULL.
>

on_error=set_to_null,
we have two options for CopyFromStateData->num_errors.
A. Counting the number of rows that on_error set_to_null happened.
B. Counting number of times that on_error set_to_null happened

let's say optionA:
ereport(NOTICE,
errmsg_plural("%llu row was converted to NULL due to
data type incompatibility",
  "%llu rows were converted to NULL due to
data type incompatibility",
  (unsigned long long) cstate->num_errors,
  (unsigned long long) cstate->num_errors));

I doubt the above message is accurate.
"%llu row was converted to NULL"
can mean "%llu row, for each row, all columns was converted to NULL"
but here we are
"%llu row, for each row, some column (can be all columns) was converted to NULL"


optionB: the message can be:
errmsg_plural("converted to NULL due to data type incompatibility
happened %llu time")
but I aslo feel the wording is not perfect also.

So overall I am not sure how to construct the NOTICE messages.




Re: Pgoutput not capturing the generated columns

2024-10-27 Thread Amit Kapila
On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v43-0001.
>
> ==
> src/backend/replication/logical/proto.c
>
> 2.
> +static bool
> +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (att->attgenerated && !columns)
> + return false;
> +
> + if (!column_in_column_list(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> Here, I wanted to suggest that the whole "Skip publishing generated
> columns" if-part is unnecessary because the next check
> (!column_in_column_list) is going to return false for the same
> scenario anyhow.
>
> But, unfortunately, the "column_in_column_list" function has some
> special NULL handling logic in it; this means none of this code is
> quite what it seems to be (e.g. the function name
> column_in_column_list is somewhat misleading)
>
> IMO it would be better to change the column_in_column_list signature
> -- add another boolean param to say if a NULL column list is allowed
> or not. That will remove any subtle behaviour and then you can remove
> the "if (att->attgenerated && !columns)" part.
>

The NULL column list still means all columns, so changing the behavior
as you are proposing doesn't make sense and would make the code
difficult to understand.

>
> 4. pgoutput_column_list_init
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> + if (att->attgenerated)
> + {
> + if (bms_is_member(att->attnum, cols))
> + gencolpresent = true;
> +
> + continue;
> + }
> +
>   nliveatts++;
>   }
>
>   /*
> - * If column list includes all the columns of the table,
> - * set it to NULL.
> + * If column list includes all the columns of the table
> + * and there are no generated columns, set it to NULL.
>   */
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts && !gencolpresent)
>   {
>
> Something seems not quite right (or maybe redundant) with this logic.
> For example, because you unconditionally 'continue' for generated
> columns, then AFAICT it is just not possible for bms_num_members(cols)
> == nliveatts and at the same time 'gencolpresent' to be true. So you
> could've just Asserted (!gencolpresent) instead of checking it in the
> condition and mentioning it in the comment.
>

It seems part of the logic is redundant. I propose to change something
along the lines of the attached. I haven't tested the attached change
as it shows how we can improve this part of code.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index d59a8f5032..17aeb80637 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1081,7 +1081,7 @@ pgoutput_column_list_init(PGOutputData *data, List 
*publications,
int i;
int nliveatts = 0;
TupleDesc   desc = 
RelationGetDescr(relation);
-   boolgencolpresent = false;
+   boolatt_gen_present = false;
 
pgoutput_ensure_entry_cxt(data, entry);
 
@@ -1098,20 +1098,19 @@ pgoutput_column_list_init(PGOutputData *data, List 
*publications,
 
if (att->attgenerated)
{
-   if 
(bms_is_member(att->attnum, cols))
-   gencolpresent = 
true;
-
-   continue;
+   att_gen_present = true;
+   break;
}
 
nliveatts++;
}
 
/*
-* If column list includes all the 
columns of the table
-* and there are no generated columns, 
set it to NULL.
+* Generated attributes are published 
only when they are
+* present in the column list. 
Otherwise, a NULL column
+* list means publish all columns.
 */
-   if (bms_num_members(cols) == nliveatts 
&& !gencolpresent)
+   if (!att_gen_present && 
bms_num_members(cols) == 

Re: Conflict detection for update_deleted in logical replication

2024-10-27 Thread Peter Smith
Hi Hou-San, here are a few trivial comments remaining for patch v6-0001.

==
General.

1.
There are multiple comments in this patch mentioning 'wal' which
probably should say 'WAL' (uppercase).

~~~

2.
There are multiple comments in this patch missing periods (.)

==
doc/src/sgml/protocol.sgml

3.
+Primary status update (B)
+
+ 
+  
+   Byte1('s')

Currently, there are identifiers 's' for the "Primary status update"
message, and 'S' for the "Primary status request" message.

As mentioned in the previous review ([1] #5b) I preferred it to be the
other way around:
'S' = status from primary
's' = request status from primary

Of course, it doesn't make any difference, but "S" seems more
important than "s", so therefore "S" being the main msg and coming
from the *primary* seemed more natural to me.

~~~

4.
+   
+Primary status request (F)

Is it better to call this slightly differently to emphasise this is
only the request?

/Primary status request/Request primary status update/

==
src/backend/replication/logical/worker.c

5.
+ * Retaining the dead tuples for this period is sufficient for ensuring
+ * eventual consistency using last-update-wins strategy, as dead tuples are
+ * useful for detecting conflicts only during the application of concurrent

As mentioned in review [1] #9, this is still called "last-update-wins
strategy" here in the comment, but was called the "last update win
strategy" strategy in the commit message. Those terms should be the
same -- e.g. the 'last-update-wins' strategy.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPs3sgXh2%3DrHDaqjU%3Dp28CK5rCgCLJZgPByc6Tr7_P2imw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Make default subscription streaming option as Parallel

2024-10-27 Thread Amit Kapila
On Tue, Oct 22, 2024 at 9:26 PM vignesh C  wrote:
>
> The attached v5 version has the change to create subscriptions in
> streaming off mode. I also did not find any other TAP test which
> required further changes.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Stepan Neretin
>
> The assertion failure happens in an autovacuum worker.  So if you are
> looking for a test that can be integrated in the tree, you could get
> some inspiration from 006_signal_autovacuum.pl and rely on an
> injection point wait with the existing autovacuum-worker-start.  My
> 2c, as it looks easy enough to avoid the hardcoded waits.
> --
>

Greetings, I appreciate your proposal. I have attempted to implement your
suggestion, but unfortunately it did not yield the desired results.

```
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;

if ($ENV{enable_injection_points} ne 'yes') {
plan skip_all => 'Injection points not supported by this build';
}

my $node = PostgreSQL::Test::Cluster->new('main');
$node->init;

$node->append_conf(
'postgresql.conf', qq[
autovacuum = on
autovacuum_naptime = 1
autovacuum_max_workers = 1
]);

$node->start;

if (!$node->check_extension('injection_points')) {
plan skip_all => 'Extension injection_points not installed';
}



my $psql1 = $node->interactive_psql('postgres');
$psql1->query("create temp table test (a int primary key);");
$node->stop('immediate');
$node->start;
$node->restart;

my $psql2 = $node->interactive_psql('postgres');
$psql2->query('CREATE EXTENSION injection_points;');
$psql2->query("SELECT injection_points_attach('autovacuum-worker-start',
'wait');");

$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
$psql2->query('select 1');

my $regexp = qr/autovacuum:/;
my $offset = 0;

ok( $node->log_contains(
$regexp,
$offset),
"autovacuum not started");


done_testing();

```
Best Regards, Stepan Neretin!


RE: Pgoutput not capturing the generated columns

2024-10-27 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for updating the patch! I resumed reviewing the patch set.
Here are only cosmetic comments as my rehabilitation.

01. getPublications()

I feel we could follow the notation like getSubscriptions(), because number of
parameters became larger. How do you feel like attached?


02. fetch_remote_table_info()

```
  "SELECT DISTINCT"
- "  (CASE WHEN (array_length(gpt.attrs, 1) = 
c.relnatts)"
- "   THEN NULL ELSE gpt.attrs END)"
+ "  (gpt.attrs)"
```

I think no need to separate lines and add bracket. How about like below?

```
 "SELECT DISTINCT gpt.attrs"
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1d79865058..5e2d28b447 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4292,30 +4292,26 @@ getPublications(Archive *fout)
query = createPQExpBuffer();
 
/* Get the publications. */
+   appendPQExpBufferStr(query, "SELECT p.tableoid, p.oid, p.pubname, "
+"p.pubowner, "
+"puballtables, p.pubinsert, 
p.pubupdate, p.pubdelete, ");
+
+   if (fout->remoteVersion >= 11)
+   appendPQExpBufferStr(query, "p.pubtruncate, ");
+   else
+   appendPQExpBufferStr(query, "false AS pubtruncate, ");
+
+   if (fout->remoteVersion >= 13)
+   appendPQExpBufferStr(query, "p.pubviaroot, ");
+   else
+   appendPQExpBufferStr(query, "false AS pubviaroot, ");
+
if (fout->remoteVersion >= 18)
-   appendPQExpBufferStr(query,
-"SELECT p.tableoid, 
p.oid, p.pubname, "
-"p.pubowner, "
-"p.puballtables, 
p.pubinsert, p.pubupdate, p.pubdelete, p.pubtruncate, p.pubviaroot, 
p.pubgencols "
-"FROM pg_publication 
p");
-   else if (fout->remoteVersion >= 13)
-   appendPQExpBufferStr(query,
-"SELECT p.tableoid, 
p.oid, p.pubname, "
-"p.pubowner, "
-"p.puballtables, 
p.pubinsert, p.pubupdate, p.pubdelete, p.pubtruncate, p.pubviaroot, false AS 
pubgencols "
-"FROM pg_publication 
p");
-   else if (fout->remoteVersion >= 11)
-   appendPQExpBufferStr(query,
-"SELECT p.tableoid, 
p.oid, p.pubname, "
-"p.pubowner, "
-"p.puballtables, 
p.pubinsert, p.pubupdate, p.pubdelete, p.pubtruncate, false AS pubviaroot, 
false AS pubgencols "
-"FROM pg_publication 
p");
+   appendPQExpBufferStr(query, "p.pubgencols ");
else
-   appendPQExpBufferStr(query,
-"SELECT p.tableoid, 
p.oid, p.pubname, "
-"p.pubowner, "
-"p.puballtables, 
p.pubinsert, p.pubupdate, p.pubdelete, false AS pubtruncate, false AS 
pubviaroot, false AS pubgencols "
-"FROM pg_publication 
p");
+   appendPQExpBufferStr(query, "false AS pubgencols ");
+
+   appendPQExpBufferStr(query, "FROM pg_publication p");
 
res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
 


Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Masahiko Sawada
On Sun, Oct 27, 2024 at 5:50 PM Michael Paquier  wrote:
>
> On Sun, Oct 27, 2024 at 05:19:45PM -0700, Masahiko Sawada wrote:
> > my $psql1 = $node->interactive_psql('postgres');
> > $psql1->query("create temp table test (a int primary key);");
> > sleep(3);
> >
> > $node->stop('immediate');
> > $node->start();
> > $node->safe_psql('postgres', 'select 1');
> >
> > sleep(5);
> > $node->safe_psql('postgres', 'select 1');
> >
> > ok(1);
> > done_testing();
>
> The assertion failure happens in an autovacuum worker.  So if you are
> looking for a test that can be integrated in the tree, you could get
> some inspiration from 006_signal_autovacuum.pl and rely on an
> injection point wait with the existing autovacuum-worker-start.  My
> 2c, as it looks easy enough to avoid the hardcoded waits.

Thank you for your suggestion. IMHO I'm not sure we need to have a
regression test for this bug as it's just an oversight of recently
committed code. My suggestion was just to help Stepan reproduce this
failure using TAP tests.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Alias of VALUES RTE in explain plan

2024-10-27 Thread Andrei Lepikhov

On 10/28/24 03:15, Yasir wrote:
By design of my solution, I was not taking it as a bug. But now, I agree 
with your opinion.
I think the case provided by Ashutosh was initially correct, and nothing 
needs to change. Look at the similar case:


EXPLAIN SELECT x,y FROM (
  SELECT oid,relname FROM pg_class WHERE relname = 'pg_index') AS 
c(x,y) WHERE c.y = 'pg_index';


 QUERY PLAN 



 Index Scan using pg_class_relname_nsp_index on pg_class 
(cost=0.27..8.29 rows=1 width=68)

   Index Cond: (relname = 'pg_index'::name)
(2 rows)

I don't see any reference to the alias c(x,y) in this explain. 
Similarly, the flattened VALUES clause shouldn't be referenced under the 
alias t(a,b).


--
regards, Andrei Lepikhov





Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-27 Thread Stepan Neretin
> Thank you for your suggestion. IMHO I'm not sure we need to have a
> regression test for this bug as it's just an oversight of recently
> committed code. My suggestion was just to help Stepan reproduce this
> failure using TAP tests.
>
>
Yes, I have reproduced the issue manually before your changes, but I am
unable to do so in the perl test.
Best Regards, Stepan Neretin!


Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread Umar Hayat
Hi,
In my opinion the issue could be Active-state perl ( as I see in your
log file, activestate perl path). In  this thread [0] people faced
same problem and solved it using strawberry perl.

Regards
Umar Hayat

[0] 
https://www.postgresql.org/message-id/flat/CADK3HHLQ1MNmfXqEvQi36D_MQrheOZPcXv2H3s6otMbSmfwjzg%40mail.gmail.com


On Mon, 28 Oct 2024 at 10:31, 黄铎彦  wrote:
>
> On Oct 28, 2024 1:58 GMT+8, Andres Freund  wrote:
> >Without knowing the error it's hard for us to help...
>
> The build log is attatched. Note current environment on my Windows PC can 
> build pg 16.3 successfully, as I referenced 
> https://www.postgresql.org/docs/16/install-windows-full.html.
> I didn't make any change to my dependencies. I use Windows 10 Home and Visual 
> Studio 2019.




Proposal to add exclusion constraint from existing index

2024-10-27 Thread Marcelo Fernandes
Hi folks,

>From the source code, the `ATExecAddIndexConstraint` function does not
support
creating an exclusion constraint from an existing gist index. Here's the
snippet I am talking about:

src/backend/commands/tablecmds.c
```c
/* Note we currently don't support EXCLUSION constraints here */
if (stmt->primary)
 constraintType = CONSTRAINT_PRIMARY;
else
 constraintType = CONSTRAINT_UNIQUE;
```

I couldn't find the exact reason why in the original commit 88452d5ba6b3e,
which prompted this email as I'm not sure if there is a considerable
limitation today or if the work wasn't viable at the time (2011).

My goal is to perform the following operations:

- Create a gist index CONCURRENTLY.
- From this index, create an exclusion constraint.

When I looked into the feasibility of this, I faced another problem:

Creating the exclusion constraint requires a second pass over the heap,
which
in my local test compounded to 1/3 of the time (see `IndexCheckExclusion`
function for reference). The other 2/3 of the time was spent in the index
creation itself.

I wonder if it's possible to split this operation into two? Creating the
index
first (allowing CONCURRENTLY), and then performing the heap rescan at
another
time? Even if the rescan takes a good chunk of time, it would be preferable
to
at least have the index part not blocking reads/writes. Provided, of
course, we
can guarantee a way to not have conflicts creeping in during the gap between
those operations.

Currently, the only way I found of achieving this is by creating a trigger
that
checks the exclusion manually, with an advisory lock on NEW.id to avoid race
conditions. But this isn't as good as having the constraint itself.

Suggestions about this proposal? Thanks in advance,
Marcelo.


Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread 黄铎彦
I could hardly find detailed and complete tutorials on building the latest 
version with Visual Studio. I did use meson to generate sln and vcxproj files 
in Command Prompt (attatchment: 00meson_cmd_to_gen_sln.txt). But after opening 
the sln and run menu command "build -> build solution", it failed. The contents 
of the Output Window in Visual Studio are in the attatchment 01build_log.txt.


On 2024/10/27 15:07, Bill Smith wrote:

Check out this:


There's a table showing the setup & build commands using the old way and the 
new way (meson).



On Oct 21, 2024, at 4:12 PM, Mark Hill  wrote:


Thanks Bill!   Do you have a sample meson command for building that you could 
share?

Thanks, Mark
 
From: Bill Smith 
Sent: Friday, October 18, 2024 4:11 PM
To: Mark Hill 
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: msvc directory missing in PostgreSQL 17.0
 

EXTERNAL

 



On Oct 18, 2024, at 4:05 PM, Mark Hill  wrote:
 
I just downloaded the Postgresql 17.0 and the extracted directory 
postgresql-17.0/src/tools does not contain

the “msvc” directory as it has since I’ve been building Postgres 12, 14, and 
16.   How am I supposed to build
PostgreSQL for Windows now?   I’m not sure what msvc_gendef.pl does.   How do 
we build Postgres for Windows
using Visual Studio?

 
You’ll want to use meson to build PG.  See the docs here:
|

|
|
|
17.7. Platform-Specific Notes
postgresql.org
|
|



Thanks, Mark












Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-27 Thread Tender Wang
Alvaro Herrera  于2024年10月27日周日 05:47写道:

> On 2024-Oct-25, Alvaro Herrera wrote:
>
> > On 2024-Oct-25, Tender Wang wrote:
> >
> > > Thanks for reporting. I can reproduce this memory invalid access on
> HEAD.
> > > After debuging codes, I found the root cause.
> > >  In DetachPartitionFinalize(), below code:
> > > att = TupleDescAttr(RelationGetDescr(partRel),
> > >attmap->attnums[conkey[i] - 1] - 1);
> > >
> > > We should use confkey[i] -1 not conkey[i] - 1;
> >
> > Wow, how embarrasing, now that's one _really_ stupid bug and it's
> > totally my own.  Thanks for the analysis!  I'll get this patched soon.
>
> Actually, now that I look at this again, I think this proposed fix is
> wrong; conkey/confkey confusion is not what the problem is.  Rather, the
> problem is that we're applying a tuple conversion map when none should
> be applied.  So the fix is to remove "attmap" altogether.  One thing
> that didn't appear correct to me was that the patch was changing one
> constraint name so that it appeared that the constrained columns were
> "id, p_id" but that didn't make sense: they are "p_id, p_jd" instead.
>

Yeah, The constrained name  change made me think about if the patch is
correct.
After your explanation, I have understood it.

Then I realized that you're wrong that it's the referenced side that
> we're processing: what we're doing there is generate the fk_attrs list,
> which is the list of constrained columns (not the list of referenced
> columns, which is pk_attrs).
>
> I also felt like modifying the fkpart12 test rather than adding a
> separate fkpart13, so I did that.
>
> So here's a v2 for this.  (Commit message needs love still, but it's a
> bit late here.)
>
>
The v2 LGTM.
BTW, while reviewing the v2 patch, I found the parentConTup in
foreach(cell, fks) block
didn't need it anymore. We can remove the related codes.

-- 
Thanks,
Tender Wang


Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-27 Thread Tom Lane
I wrote:
> In the no-good-deed-goes-unpunished department: buildfarm member
> hamerkop doesn't like this patch [1].  The diffs look like
> ...
> So what I'd like to do to fix this is to change
> - if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL)
> + if ((file = AllocateFile(filename, "r")) == NULL)

Well, that didn't fix it :-(.  I went so far as to extract the raw log
files from the buildfarm database, and what they show is that there is
absolutely no difference between the lines diff is claiming are
different:

-QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n
+QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n

It's the same both before and after 924e03917, which made the code
change depicted above, so that didn't help.

So I'm pretty baffled.  I suppose the expected and result files must
actually be different, and something in subsequent processing is
losing the difference before it gets to the buildfarm database.
But I don't have the ability to debug that from here.  Does anyone
with access to hamerkop want to poke into this?

Without additional information, the only thing I can think of that
I have any confidence will eliminate these failures is to reformat
the affected test cases so that they produce just a single line of
output.  That's kind of annoying from a functionality-coverage point
of view, but I'm not sure avoiding it is worth moving mountains for.

In any case, I'm disinclined to revert 924e03917.  It seems like a
good change on balance, even if it failed to fix whatever is
happening on hamerkop.

regards, tom lane




libedit history seems to be misbehaving / broken

2024-10-27 Thread Tomas Vondra
Hi,

I accidentally built master with libedit (instead of the readline I use
most of the time, due to missing readline-devel package). And for a
while it was working fine, but then I realized the history is not
working anymore, and is broken in a weird way :-(

I'm not terribly familiar with libedit, so can't say if it's a bug in
libedit, or if we're just using it wrong in some way. I investigated
that while on a flight from pgconf.eu, but can't look into this further.
So let me at least share the behavior I observed, and what I found.

Note: All of this is with libedit-3.1-53.20240808cvs.fc40.x86_64, which
is what I currently get in Fedora 40.

Initially, history seems to work - more or less. Except for the detail
that we always "repeat" the last command from the history. So if you do
"SELECT 1" then the first query added to the history by the *next* psql
session is going to be "SELECT 1". And this happens even if you don't do
anything in psql. So if you enter & immediately exit 10x, you get 10
copies of the last query. Which is rather weird, and it also makes it
less convenient to walk the history (ctrl-r search is fine, ofc).

But there comes the following issue. Libedit identifies the history file
by adding _HiStOrY_V2_ as the first line. But it also limits the history
to 500 lines, and after adding the 501st line, it gets confused,
attempts to enforce the 500-line limit, and removes the first line,
which is the _HiStOrY_V2_ header. At which point the history stops to
work, more or less - psql can't access the history, apparently due the
header missing. We still append new entries at the end, can access
earlier commands from the same session, but nothing from the file.

And it stays broken forever, of course :-(


regards

-- 
Tomas Vondra





Re: msvc directory missing in PostgreSQL 17.0

2024-10-27 Thread Andres Freund
Hi, 

On October 27, 2024 5:50:51 AM EDT, "huangdy2...@qq.com"  
wrote:
>On Sun, 27 Oct 2024 at 17:38 GMT+8, 黄铎彦  wrote:
>
>
>>That repo seems to be something like Github action. But I would like to 
>>generate sln and vcxproj files in my Windows PC, then open Visual Studio, 
>>compile and navigate.
>
>
>I used `meson setup --backend vs` but got compile error with the sln file.

Without knowing the error it's hard for us to help... 

Andres 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-27 Thread Pavel Stehule
ne 27. 10. 2024 v 18:42 odesílatel Tom Lane  napsal:

> I wrote:
> > In the no-good-deed-goes-unpunished department: buildfarm member
> > hamerkop doesn't like this patch [1].  The diffs look like
> > ...
> > So what I'd like to do to fix this is to change
> > - if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL)
> > + if ((file = AllocateFile(filename, "r")) == NULL)
>
> Well, that didn't fix it :-(.  I went so far as to extract the raw log
> files from the buildfarm database, and what they show is that there is
> absolutely no difference between the lines diff is claiming are
> different:
>
> -QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n
> +QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n
>
> It's the same both before and after 924e03917, which made the code
> change depicted above, so that didn't help.
>
> So I'm pretty baffled.  I suppose the expected and result files must
> actually be different, and something in subsequent processing is
> losing the difference before it gets to the buildfarm database.
> But I don't have the ability to debug that from here.  Does anyone
> with access to hamerkop want to poke into this?
>
> Without additional information, the only thing I can think of that
> I have any confidence will eliminate these failures is to reformat
> the affected test cases so that they produce just a single line of
> output.  That's kind of annoying from a functionality-coverage point
> of view, but I'm not sure avoiding it is worth moving mountains for.
>
> In any case, I'm disinclined to revert 924e03917.  It seems like a
> good change on balance, even if it failed to fix whatever is
> happening on hamerkop.
>

+1

This is very useful feature

Pavel



> regards, tom lane
>


Re: libedit history seems to be misbehaving / broken

2024-10-27 Thread Tomas Vondra



On 10/27/24 20:03, Tom Lane wrote:
> Tomas Vondra  writes:
>> I'm not terribly familiar with libedit, so can't say if it's a bug in
>> libedit, or if we're just using it wrong in some way. I investigated
>> that while on a flight from pgconf.eu, but can't look into this further.
>> So let me at least share the behavior I observed, and what I found.
>> Note: All of this is with libedit-3.1-53.20240808cvs.fc40.x86_64, which
>> is what I currently get in Fedora 40.
> 
> libedit has been a hot mess for a long time.  psql's input.c has
> workarounds for assorted bugs that it has or had in prior versions,
> and what I suspect is that latest libedit has a different set of bugs.
> 
> FWIW, I don't observe any particular misbehavior with the very ancient
> libedit that macOS ships.  On Fedora 39, I notice something related to
> what you say: it seems like the "\q" ending a session gets repeated
> into .psql_history by the next session.  I'm surprised that it's not
> exactly the same as your results though, because it seems to be the
> same source version:
> 
> $ rpm -q libedit
> libedit-3.1-53.20240808cvs.fc39.x86_64
> 

That's probably because I usually terminate psql by Ctrl-D, not by
typing "\q". But yeah, if I use "\q" it gets added twice.

> Didn't try the too-many-lines behavior, but it looks like that
> is implemented totally by history_truncate_file(), so if that's
> busted it's surely their fault.
> 

Sounds likely. What surprises me a bit, I haven't found any reports of
particularly similar bugs ... I'd have expected other people to hit this
too, but who knows.

-- 
Tomas Vondra





Re: Alias of VALUES RTE in explain plan

2024-10-27 Thread Tom Lane
Yasir  writes:
> On Sat, Oct 26, 2024 at 12:21 AM Tom Lane  wrote:
>> I forgot to mention a third problem, which is that reassigning the
>> alias during subquery pullup means it doesn't happen if subquery
>> pullup doesn't happen.

> Yes, that is by design.

By design?  How can you claim that's not a bug?  The alias(es)
associated with a VALUES clause surely should not vary depending
on whether the clause includes a volatile function --- not to
mention other unobvious reasons for flattening to happen or not.

regards, tom lane




Re: Alias of VALUES RTE in explain plan

2024-10-27 Thread Yasir
On Mon, Oct 28, 2024 at 1:07 AM Tom Lane  wrote:

> Yasir  writes:
> > On Sat, Oct 26, 2024 at 12:21 AM Tom Lane  wrote:
> >> I forgot to mention a third problem, which is that reassigning the
> >> alias during subquery pullup means it doesn't happen if subquery
> >> pullup doesn't happen.
>
> > Yes, that is by design.
>
> By design?  How can you claim that's not a bug?  The alias(es)
> associated with a VALUES clause surely should not vary depending
> on whether the clause includes a volatile function --- not to
> mention other unobvious reasons for flattening to happen or not.
>

By design of my solution, I was not taking it as a bug. But now, I agree
with your opinion.


>
> regards, tom lane
>


Allowing pg_recvlogical to create temporary replication slots

2024-10-27 Thread Torsten Förtsch
Hi,

This is my very first message to this mailing list. Please advise if I am
making any mistakes in the procedure.

The attached patch enables pg_recvlogical to create a temporary slot.

What is the next step in the process to get this change into official
postgres?

Thanks,
Torsten
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 3db520ed38..22ab96129c 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -50,6 +50,7 @@ static int	fsync_interval = 10 * 1000; /* 10 sec = default */
 static XLogRecPtr startpos = InvalidXLogRecPtr;
 static XLogRecPtr endpos = InvalidXLogRecPtr;
 static bool do_create_slot = false;
+static bool slot_is_temporary = false;
 static bool slot_exists_ok = false;
 static bool do_start_slot = false;
 static bool do_drop_slot = false;
@@ -104,6 +105,7 @@ usage(void)
 	printf(_("  -s, --status-interval=SECS\n"
 			 " time between status packets sent to server (default: %d)\n"), (standby_message_timeout / 1000));
 	printf(_("  -S, --slot=SLOTNAMEname of the logical replication slot\n"));
+	printf(_("  --temporary-slot   the slot created by --create-slot exists until the connection is dropped\n"));
 	printf(_("  -t, --two-phaseenable decoding of prepared transactions when creating a slot\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
@@ -227,10 +229,24 @@ StreamLogicalLog(void)
 	 * Connect in replication mode to the server
 	 */
 	if (!conn)
+	{
 		conn = GetConnection();
-	if (!conn)
-		/* Error message already written in GetConnection() */
-		return;
+		if (!conn)
+			/* Error message already written in GetConnection() */
+			return;
+
+		/* Recreate a replication slot. */
+		if (do_create_slot && slot_is_temporary)
+		{
+			if (verbose)
+pg_log_info("recreating replication slot \"%s\"", replication_slot);
+
+			if (!CreateReplicationSlot(conn, replication_slot, plugin, slot_is_temporary,
+	   false, false, slot_exists_ok, two_phase))
+goto error;
+			startpos = InvalidXLogRecPtr;
+		}
+	}
 
 	/*
 	 * Start the replication
@@ -719,6 +735,7 @@ main(int argc, char **argv)
 		{"start", no_argument, NULL, 2},
 		{"drop-slot", no_argument, NULL, 3},
 		{"if-not-exists", no_argument, NULL, 4},
+		{"temporary-slot", no_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -847,6 +864,9 @@ main(int argc, char **argv)
 			case 4:
 slot_exists_ok = true;
 break;
+			case 5:
+slot_is_temporary = true;
+break;
 
 			default:
 /* getopt_long already emitted a complaint */
@@ -981,7 +1001,7 @@ main(int argc, char **argv)
 		if (verbose)
 			pg_log_info("creating replication slot \"%s\"", replication_slot);
 
-		if (!CreateReplicationSlot(conn, replication_slot, plugin, false,
+		if (!CreateReplicationSlot(conn, replication_slot, plugin, slot_is_temporary,
    false, false, slot_exists_ok, two_phase))
 			exit(1);
 		startpos = InvalidXLogRecPtr;


Re: libedit history seems to be misbehaving / broken

2024-10-27 Thread Tom Lane
Tomas Vondra  writes:
> I'm not terribly familiar with libedit, so can't say if it's a bug in
> libedit, or if we're just using it wrong in some way. I investigated
> that while on a flight from pgconf.eu, but can't look into this further.
> So let me at least share the behavior I observed, and what I found.
> Note: All of this is with libedit-3.1-53.20240808cvs.fc40.x86_64, which
> is what I currently get in Fedora 40.

libedit has been a hot mess for a long time.  psql's input.c has
workarounds for assorted bugs that it has or had in prior versions,
and what I suspect is that latest libedit has a different set of bugs.

FWIW, I don't observe any particular misbehavior with the very ancient
libedit that macOS ships.  On Fedora 39, I notice something related to
what you say: it seems like the "\q" ending a session gets repeated
into .psql_history by the next session.  I'm surprised that it's not
exactly the same as your results though, because it seems to be the
same source version:

$ rpm -q libedit
libedit-3.1-53.20240808cvs.fc39.x86_64

Didn't try the too-many-lines behavior, but it looks like that
is implemented totally by history_truncate_file(), so if that's
busted it's surely their fault.

regards, tom lane




Re: [Patch] remove duplicated smgrclose

2024-10-27 Thread Kirill Reshke
On Wed, 14 Aug 2024 at 11:35, Steven Niu  wrote:
>
> Junwang, Kirill,
>
> The split work has been done. I created a new patch for removing redundant 
> smgrclose() function as attached.
> Please help review it.
>
> Thanks,
> Steven
>
> Steven Niu  于2024年8月12日周一 18:11写道:
>>
>> Kirill,
>>
>> Good catch!
>> I will split the patch into two to cover both cases.
>>
>> Thanks,
>> Steven
>>
>>
>> Junwang Zhao  于2024年8月9日周五 18:19写道:
>>>
>>> On Fri, Aug 9, 2024 at 5:20 PM Kirill Reshke  wrote:
>>> >
>>> > On Thu, 1 Aug 2024 at 17:32, Junwang Zhao  wrote:
>>> > >
>>> > > Hi Steven,
>>> > >
>>> > > On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
>>> > > >
>>> > > > Hello, hackers,
>>> > > >
>>> > > > I think there may be some duplicated codes.
>>> > > > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and 
>>> > > > smgrclose().
>>> > > > But both functions would close SMgrRelation object, it's dupliacted 
>>> > > > behavior?
>>> > > >
>>> > > > So I make this patch. Could someone take a look at it?
>>> > > >
>>> > > > Thanks for your help,
>>> > > > Steven
>>> > > >
>>> > > > From Highgo.com
>>> > > >
>>> > > >
>>> > > You change LGTM, but the patch seems not to be applied to HEAD,
>>> > > I generate the attached v2 using `git format` with some commit message.
>>> > >
>>> > > --
>>> > > Regards
>>> > > Junwang Zhao
>>> >
>>> > Hi all!
>>> > This change looks good to me. However, i have an objection to these
>>> > lines from v2:
>>> >
>>> > >  /* Close the forks at smgr level */
>>> > > - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
>>> > > - smgrsw[which].smgr_close(rels[i], forknum);
>>> > > + smgrclose(rels[i]);
>>> >
>>> > Why do we do this? This seems to be an unrelated change given thread
>>> > $subj. This is just a pure refactoring job, which deserves a separate
>>> > patch. There is similar coding in
>>> > smgrdestroy function:
>>> >
>>> > ```
>>> > for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
>>> > smgrsw[reln->smgr_which].smgr_close(reln, forknum);
>>> > ```
>>> >
>>> > So, I feel like these two places should be either changed together or
>>> > not be altered at all. And is it definitely a separate change.
>>>
>>> Yeah, I tend to agree with you, maybe we should split the patch
>>> into two.
>>>
>>> Steven, could you do this?
>>>
>>> >
>>> > --
>>> > Best regards,
>>> > Kirill Reshke
>>>
>>>
>>>
>>> --
>>> Regards
>>> Junwang Zhao

Hi!
Looks like discussion on the subject is completed, and no open items
left, so I will try to mark commitfest [1] entry as Ready For
Committer.

[1] https://commitfest.postgresql.org/50/5149/
-- 
Best regards,
Kirill Reshke




Re: Alias of VALUES RTE in explain plan

2024-10-27 Thread Yasir
On Fri, Oct 25, 2024 at 10:35 PM Tom Lane  wrote:

> Yasir  writes:
> > I have fixed the code to produce desired output by adding a few lines in
> > pull_up_simple_subquery().
> > Attached patch is divided in 2 files:
> >  - 001-Fix-Alias-VALUES-RTE.patch contains the actual fix.
> >  - 002-Fix-Alias-VALUES-RTE.patch contains the expected output changes
> > against the actual fix.
>
> I was initially skeptical about this, because we've been printing
> "*VALUES*" for a decade or more and there have been few complaints.
> So I wondered if the change would annoy more people than liked it.
> However, after looking at the output for awhile, it is nice that the
> columns of the VALUES are referenced with their user-given names
> instead of "columnN".  I think that's enough of an improvement that
> it'll win people over.
>

Hopefully, yes.


> However ... I don't like this implementation, not even a little
> bit.  Table/column aliases are assigned by the parser, and the
> planner has no business overriding them.  Quite aside from being
>

Totally agreed.


> a violation of system structural principles, there are practical
> reasons not to do it like that:
>
> 1. We'd see different results when considering plan trees than
> unplanned query trees.
>
> 2. At the place where you put this, some planning transformations have
> already been done, and that affects the results.  That means that
> future extensions or restructuring of the planner might change the
> results, which seems undesirable.
>
> I think the right way to make this happen is for the parser to
> do it, which it can do by passing down the outer query level's
> Alias to addRangeTableEntryForValues.  There's a few layers of
> subroutine calls between, but we can minimize the pain by adding
> a ParseState field to carry the Alias.  See attached.
>

Actually, I fixed this problem using two approaches. One at the parser
side, 2nd at the planner.
The one I submitted was the latter one. The first way (attached partially)
I fixed the problem is almost similar to your approach.
Obviously, yours better manages the parent alias.
Why I submitted the 2nd solution was because I wanted to make as few
changes in the code as I could.


> My point 2 is illustrated by the fact that my patch produces
> different results in a few cases than yours does --- look at
> groupingsets.out in particular.  I think that's fine, and
> the changes that yours makes and mine doesn't look a little
> unprincipled.  For example, in the tests involving the "gstest1"
> view, if somebody wants nice labels on that view's VALUES columns
> then the right place to apply those labels is within the view.
> Letting a subsequent call of the view inject labels seems pretty
> action-at-a-distance-y.
>

> regards, tom lane
>
>
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 6593fd7d81..447e406255 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -932,7 +932,7 @@ interpret_AS_clause(Oid languageOid, const char *languageName,
 
 pstate->p_sourcetext = queryString;
 sql_fn_parser_setup(pstate, pinfo);
-q = transformStmt(pstate, stmt);
+q = transformStmt(pstate, stmt, NULL);
 if (q->commandType == CMD_UTILITY)
 	ereport(ERROR,
 			errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -951,7 +951,7 @@ interpret_AS_clause(Oid languageOid, const char *languageName,
 
 			pstate->p_sourcetext = queryString;
 			sql_fn_parser_setup(pstate, pinfo);
-			q = transformStmt(pstate, sql_body_in);
+			q = transformStmt(pstate, sql_body_in, NULL);
 			if (q->commandType == CMD_UTILITY)
 ereport(ERROR,
 		errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 28fed9d87f..827b560abe 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -64,7 +64,7 @@ static OnConflictExpr *transformOnConflictClause(ParseState *pstate,
  OnConflictClause *onConflictClause);
 static int	count_rowexpr_columns(ParseState *pstate, Node *expr);
 static Query *transformSelectStmt(ParseState *pstate, SelectStmt *stmt);
-static Query *transformValuesClause(ParseState *pstate, SelectStmt *stmt);
+static Query *transformValuesClause(ParseState *pstate, SelectStmt *stmt, Alias *alias);
 static Query *transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt);
 static Node *transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
 	   bool isTopLevel, List **targetlist);
@@ -220,6 +220,7 @@ parse_analyze_withcb(RawStmt *parseTree, const char *sourceText,
 Query *
 parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
   CommonTableExpr *parentCTE,
+  Alias *alias,
   bool locked_from_parent,
   bool resolve_unknowns)
 {
@@ -230,7 +231,7 @@ parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
 	pstate->p_locked_from_parent = locked_from_parent;
 	pstate->p_res

Re: Alias of VALUES RTE in explain plan

2024-10-27 Thread Yasir
On Sat, Oct 26, 2024 at 12:21 AM Tom Lane  wrote:

> I wrote:
> > However ... I don't like this implementation, not even a little
> > bit.
>
> I forgot to mention a third problem, which is that reassigning the
> alias during subquery pullup means it doesn't happen if subquery
> pullup doesn't happen.  As an example, with your patch:
>
> regression=# explain verbose select * from (values (1), (2)) v(x);
>  QUERY PLAN
> 
>  Values Scan on v  (cost=0.00..0.03 rows=2 width=4)
>Output: v.x
> (2 rows)
>
> regression=# explain verbose select * from (values (1), (random())) v(x);
>  QUERY PLAN
> -
>  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=8)
>Output: "*VALUES*".column1
> (2 rows)
>
> That's because the volatile function prevents subquery flattening.
>

Yes, that is by design. As I used is_simple_values() so if the values list
is not a simple one, which is not in this case, the alias won't be
reassigned.


> regards, tom lane
>


Re: heap_inplace_lock vs. autovacuum w/ LOCKTAG_TUPLE

2024-10-27 Thread Noah Misch
On Sun, Oct 27, 2024 at 08:00:00AM +0300, Alexander Lakhin wrote:
> 27.10.2024 07:09, Noah Misch wrote:
> > On Sat, Oct 26, 2024 at 11:49:36AM -0700, Noah Misch wrote:
> > > intra-grant-inplace-db.spec got a novel failure today:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sarus&dt=2024-10-26%2014%3A08%3A58
> > > 
> > > The isolationtester_waiting query is supposed to detect that step vac2 is
> > > blocked.  vac2 didn't finish within the timeout, but 
> > > isolationtester_waiting
> > > never considered it blocked.
> > > ... work on reproducing this.

The attached demo reproduces $SUBJECT for me.  The clue was 'CONTEXT:  while
scanning block 0 of relation "pg_catalog.pg_database"'.  That
vacuum_error_callback() message indicated VACUUM_ERRCB_PHASE_SCAN_HEAP as the
current phase.  That implies vac2 had not reached any inplace update, any
LOCKTAG_TUPLE, or any part of vac_update_datfrozenxid().  I bet vac2 was stuck
in LockBufferForCleanup().  Concurrently, a sequence of autovacuum workers
held a buffer pin blocking that cleanup.  The demo makes two changes:

a. Start intra-grant-inplace-db.spec step vac2 just after some autovacuum
   worker is blocked on the xid of the uncommitted GRANT.  While blocked, the
   worker holds a pin on the one pg_database page.  vac2 needs
   LockBufferForCleanup($THAT_PAGE).
b. Make LockBufferForCleanup() sleep 10ms after WAIT_EVENT_BUFFER_PIN, to give
   the next autovacuum worker time to win the pinning race.

LockBufferForCleanup() waits for a pin count to fall to 1.  While waiting, its
techniques are less sophisticated than what we have for lock.c heavyweight
locks.  UnpinBufferNoOwner() notifies the cleanup waiter, but nothing stops
another backend from pinning before the cleanup waiter reacts.  We have code
to cancel an autovacuum for the purpose of liberating a heavyweight lock, but
we don't have code like that to free a pin.
pg_isolation_test_session_is_blocked() doesn't detect buffer pin blockages.


Two fix candidates look most promising:

1. Unpin before heap_inplace_lock() sleeps.
2. Change intra-grant-inplace-db.spec to test freezing only MXIDs, not XIDs.

Let's do (1), as attached.  Apart from some arguably-convoluted call stacks,
this has no disadvantages.  An earlier version did unpin.
postgr.es/m/20240822073200.4f.nmi...@google.com stopped that, arguing,
"heap_update() doesn't optimize that way".  In light of $SUBJECT, I no longer
see heap_update() as the applicable standard.  Since autovacuum can run
anytime, it has an extra duty to be non-disruptive.  heap_update() during
auto-analyze won't wait much, because analyze takes a self-exclusive table
lock and is the sole writer of stats tables.  Inplace updates during
autovacuum are different.  They do contend with other transactions.  Since we
lack code to cancel an autovacuum to free a pin, a long-lived pin in
autovacuum is more disruptive than a long-lived lock in autovacuum.

While I've not tried it, I expect (2) would work as follows.  pg_database
won't contain MXIDs, so lazy_scan_noprune() would approve continuing without
the cleanup lock.  Unlike (1), this wouldn't help concurrency outside the
test.

> FWIW, there was a similar failure in August: [1], and I also could not
> reproduce that locally, yet wrote a preliminary analysis at [2] in the
> Unsorted section, in the hope to see it again and continue investigation.
> 
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=iguana&dt=2024-08-29%2013%3A57%3A57
> [2] https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures

Thanks.  I had not known about iguana's failure.  What are the chances that
the buffer pin could explain that one?
Author: Noah Misch 
Commit: Noah Misch 

[not for commit] demo pin starvation from autovacuum inplace update

Test with intra-grant-inplace-db.spec.  Changes here:

a. Start intra-grant-inplace-db.spec step vac2 just after some
   autovacuum worker is blocked on the xid of the uncommitted GRANT.
   While blocked, the worker holds a pin on the one pg_database page.
   vac2 needs LockBufferForCleanup(that-page).
b. Make LockBufferForCleanup() sleep 10ms after WAIT_EVENT_BUFFER_PIN,
   to give the next autovacuum worker time to win the pinning race.

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 0f02bf6..5268294 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5344,8 +5344,17 @@ LockBufferForCleanup(Buffer buffer)
SetStartupBufferPinWaitBufId(-1);
}
else
+   {
ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN);
 
+   /*
+* Sleep to induce starvation.  With heavyweight locks, 
the
+* releaser grants to the next queue member, preventing
+* starvation.  Buffer pins have no such mechanism.

Re: race condition in pg_class

2024-10-27 Thread Noah Misch
On Thu, Aug 22, 2024 at 12:32:00AM -0700, Noah Misch wrote:
> This move also loses the optimization of unpinning before XactLockTableWait().
> heap_update() doesn't optimize that way, so that's fine.

In this other thread, I'm proposing to go back to unpinning:
https://postgr.es/m/20241027214035.8a.nmi...@google.com