Re: sunsetting md5 password support

2024-10-28 Thread Nathan Bossart
On Mon, Oct 28, 2024 at 04:10:29PM -0500, Jim Nasby wrote:
> Patch itself looks good, but it does leave me wondering if cleartext
> should also be deprecated?

I see that Tom has already chimed in on this point.  In any case, this is
probably a topic for another thread.

> Might also be worth mentioning deprecation in pg_hba.conf.

Yeah.  I vaguely recall waffling on whether to add one there, and for
whatever reason, I decided against it.  I've added it in v4.

-- 
nathan
>From 716ca7332ed3e9e7e23146c926f93cb759a0d227 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 28 Oct 2024 19:54:02 -0500
Subject: [PATCH v4 1/1] Deprecate MD5 passwords.

MD5 has been considered to be unsuitable for use as a cryptographic
hash algorithm for some time.  Furthermore, MD5 password hashes in
PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing
the username and hashed password is sufficient to authenticate.
The SCRAM-SHA-256 method added in v10 is not subject to these
problems and is considered to be superior to MD5.

This commit marks MD5 password support in PostgreSQL as deprecated
and to be removed in a future release.  The documentation now
contains several deprecation notices, and CREATE ROLE and ALTER
ROLE now emit deprecation warnings when setting MD5 passwords.  The
warnings can be disabled by setting the md5_password_warnings
parameter to "off".

Reviewed-by: Greg Sabino Mullane, Jim Nasby
Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan
---
 .../passwordcheck/expected/passwordcheck.out  |  1 +
 .../expected/passwordcheck_1.out  |  1 +
 contrib/passwordcheck/sql/passwordcheck.sql   |  1 +
 doc/src/sgml/catalogs.sgml|  9 +++
 doc/src/sgml/client-auth.sgml | 17 +
 doc/src/sgml/config.sgml  | 24 +++
 doc/src/sgml/libpq.sgml   |  9 +++
 doc/src/sgml/protocol.sgml|  8 +++
 doc/src/sgml/ref/create_role.sgml |  8 +++
 doc/src/sgml/runtime.sgml | 10 
 src/backend/libpq/crypt.c | 10 
 src/backend/utils/misc/guc_tables.c   |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/libpq/crypt.h |  3 +++
 src/test/regress/expected/password.out| 15 
 src/test/regress/expected/password_1.out  |  9 +++
 16 files changed, 135 insertions(+)

diff --git a/contrib/passwordcheck/expected/passwordcheck.out 
b/contrib/passwordcheck/expected/passwordcheck.out
index 2027681daf..dfb2ccfe00 100644
--- a/contrib/passwordcheck/expected/passwordcheck.out
+++ b/contrib/passwordcheck/expected/passwordcheck.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/expected/passwordcheck_1.out 
b/contrib/passwordcheck/expected/passwordcheck_1.out
index 5d8d5dcc1c..9519d60a49 100644
--- a/contrib/passwordcheck/expected/passwordcheck_1.out
+++ b/contrib/passwordcheck/expected/passwordcheck_1.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/sql/passwordcheck.sql 
b/contrib/passwordcheck/sql/passwordcheck.sql
index 1fbd6b0e96..5953ece5c2 100644
--- a/contrib/passwordcheck/sql/passwordcheck.sql
+++ b/contrib/passwordcheck/sql/passwordcheck.sql
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 
 CREATE USER regress_passwordcheck_user1;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 964c819a02..0b9ca087c8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1618,6 +1618,15 @@
will store the md5 hash of xyzzyjoe.
   
 
+  
+   
+Support for MD5-encrypted passwords is deprecated and will be removed in a
+future release of PostgreSQL.  Refer to
+ for details about migrating to another
+password type.
+   
+  
+
   
If the password is encrypted with SCRAM-SHA-256, it has the format:
 
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 51343de7ca..782b49c85a 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -531,6 +531,15 @@ include_dir directory
   user's password. See 
   for details.
  
+ 
+  
+   Support for MD5-encrypted passwords is deprecated and will be
+   removed in a future release of
+   PostgreSQL.  Refer to
+for details about migrating to
+   another password type.
+  
+ 
 

 
@@ -1260,6 +1269,14 @@ omicron bryanh  guest1
server is encrypted for SCRAM (see below), then SCRAM-based
authentication will automatically be chosen instead.
   
+
+  
+   
+Support for MD5-encrypted

freespace.c modifies buffer without any locks

2024-10-28 Thread Andres Freund
Hi,

I just noticed that fsm_vacuum_page() modifies a buffer without even holding a
shared lock.  That quite obviously seems like a violation of the buffer
locking protocol:

/*
 * Try to reset the next slot pointer. This encourages the use of
 * low-numbered pages, increasing the chances that a later vacuum can
 * truncate the relation.  We don't bother with a lock here, nor with
 * marking the page dirty if it wasn't already, since this is just a 
hint.
 */
if (BufferPrepareToSetHintBits(buf))
{
((FSMPage) PageGetContents(page))->fp_next_slot = 0;
BufferFinishSetHintBits(buf);
}


In the commit (15c121b3ed7) adding the current freespace code, there wasn't
even a comment remarking upon that oddity.  10 years later Tom added a
comment, in 2b1759e2675f.


I noticed this while adding a debug mode in which buffers are mprotected
PROT_NONE/PROT_READ/PROT_READ|PROT_WRITE depending on the buffer's state.


Is there any good reason to avoid a lock here? Compared to the cost of
exclusively locking buffers during RecordAndGetPageWithFreeSpace() the cost of
doing so during FreeSpaceMapVacuum*() seems small?




Somewhat relatedly, but I don't think I understand why it's a good idea to
reset fp_next_slot to 0 in fsm_vacuum_page(). At least doing so
unconditionally.

When extending a relation, it seems we'll constantly reset the search back to
the start of the range, even though we pretty much know that there's no space
earlier in the relation - otherwise we'd not have extended.

And when called from FreeSpaceMapVacuumRange() we'll reset fp_next_slot to
somewhere that wasn't actually vacuumed, afaict?

Greetings,

Andres Freund




Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David G. Johnston
On Mon, Oct 28, 2024 at 6:03 PM David Rowley  wrote:

> We don't seem to be agreeing on much here... :-(
>
> On Tue, 29 Oct 2024 at 13:30, David G. Johnston
>  wrote:
> >
> > On Mon, Oct 28, 2024 at 3:54 PM David Rowley 
> wrote:
> >> I'm concerned about the wording "all index-scan related".  It's not
> >> that clear if that would include Bitmap Index Scans or not.
> >
> >
> > That was partially the point of writing "all" there - absent other
> information, and seeing how index-only scans were treated, I presumed it
> was indeed actually or effectively a switch for all.  If it is not it
> should be made clear which node types with the word index in them are not
> affected.
>
> I'm very much against mentioning which things are *not* affected by
> settings. It doesn't seem like a very sustainable way to write
> documentation.
>

The documentation presently uses the term "index-scan related" and it is
unclear what exactly that is supposed to cover.  My addition of the word
"all" doesn't materially change this other than for certain covering the
"index-only-scan related" nodes that gets clarified and is
cross-referenced.  If you are uncertain whether adding "all" is meant to
cover Bitmap Index Scans then your uncertainty still exists in the current
wording.  I just added "all" to be explicit about that fact, or at least
that is what I thought I did.

For me, the answer to "are bitmap index scans disabled" by setting
enable_indexscans to off is "yes" and does not require explanation.  If the
real answer is "no" then please propose a change that can disabuse me of my
belief.


> > Is there a listing of all node types produced by PostgreSQL (with the
> explain output naming) along with which ones are affected by which enable_*
> knobs (possibly multiple for something like Bitmap Index Scan)?
>
> No. We purposefully do our best not to document executor nodes. The
> enable_* GUCs is one place where it's hard to avoid.
>

For education, mainly mine, not to add to the documentation; though our
lack of detail here for what are user-facing things is IMO unfortunate.


> >>
> >> Could we just add "The  setting
> >> must also be enabled to have the query planner consider
> >> index-only-scans"?
> >
> >
> > I'd like to stick with a conjunction there but agree the "must be
> enabled" wording is preferrable, avoiding the double-negative.
> >
> > "The default is on, but the  setting must also be enabled."
> >
> > The 'to have the...' part seems to just be redundant.
>
> I think it's confusing to include this as part of the mention of what
> the default value is. The default value and enable_indexscans being
> the master switch aren't at all related.
>
>
Fair point.  I'm good with your proposed change here.

David J.


Re: detoast datum into the given buffer as a optimization.

2024-10-28 Thread Andy Fan


Hi Tom,

> Andy Fan  writes:
>>  *   Note if caller provides a non-NULL buffer, it is the duty of caller
>>  * to make sure it has enough room for the detoasted format (Usually
>>  * they can use toast_raw_datum_size to get the size)
>
> This is a pretty awful, unsafe API design.

Sorry that I expressed my thoughts incorrectly. I'm not going to
refactor "detoast_attr", I'm going to add a new API named
"detoast_attr_buffer", which very similar with text_to_cstring and
text_to_string_buffer. Most of the user still can using detoast_attr,
only the user who care about the MemoryContext or memcpy issue, they can
the deotast_attr_buffer variant. 

> It puts it on the caller
> to know how to get the detoasted length, and it implies double
> decoding of the toast datum.

I really nearly give up this idea today because of this, later I
realized something which was known when I was writing the first message
is forgotten by me now. Since it is really confused, I want to highlight it
here for double check from more people.

I thought 'toast_raw_datum_size' is the existing function to get the
"detoasted length" for the caller of detoast_attr_buffer, so it looks
reasonable for me to assume the caller knows how to get the detoast
length. 

What really confused me suddenly is it is really correct to use
"toast_raw_datum_size". In "toast_raw_datum_size":

if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
/* va_rawsize is the size of the original datum -- including 
header */
struct varatt_external toast_pointer;

VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
result = toast_pointer.va_rawsize;
}

  We just return the va_rawsize directly. Does it work for a datum which
  is compressed first and then stored on external ondisk? After some
  more research, it is correct since the rawsize is the size of
  uncompressed data. and we also use the fact in
  1b393f4e5db4fd6bbc86a4e88785b6945a1541d0. This is why I said:

 > One of the key point is we can always get the varlena rawsize cheaply
 > without any real detoast activity in advance, thanks to the existing
 > varlena design.

 This is the thing I forget today.

Since user can know the size easily now, is it cheap to use it? By
looking the code in toast_raw_datum_size, I really hard to say it is
expensive. 

>
> How about a variant like
>
> struct varlena *
> detoast_attr_cxt(struct varlena *attr, MemoryContext cxt)
>
> which promises to allocate the result in the specified context?
> That would cover most of the practical use-cases, I think.

Yes, it work for some use case, and it is similar with what I did in
[1] (search detoast_attr_ext).  However it can't support the case where
user want to detoast the data into the given buffer (to avoid the later
memcpy), so detoast_attr_buffer is the favorite API right now. If it is
not doable, detoast_attr_ctx is also works for me.

Would you still think detoast_attr_buffer is not a acceptable API now at
the high level design. I'm working on an implementation, but I want have
some high-level designment agreement first. 

[1]
https://www.postgresql.org/message-id/attachment/160491/v10-0001-shared-detoast-datum.patch

-- 
Best Regards
Andy Fan





Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-10-28 Thread Michael Paquier
On Mon, Oct 28, 2024 at 01:07:16PM +0100, Daniel Gustafsson wrote:
> +1 for applying backpatched to at least 17 but possibly further down judging 
> by
> the linked threads.

When using ActiveState perl, being able to call `perl` from a PATH
requires one to register into a central service related to the company
that provides these binaries.  Still recommending it even on stable
branches makes me really uneasy.
--
Michael


signature.asc
Description: PGP signature


Re: Why don't we consider explicit Incremental Sort?

2024-10-28 Thread Andrei Lepikhov

On 10/10/24 09:18, Richard Guo wrote:

On Sun, Sep 22, 2024 at 1:38 PM David Rowley  wrote:
I've pushed this patch after tweaking this part in the commit message.
Thank you both for your reviews.

My apologies for the late review, but IMO there are some minor weaknesses.
Working on the improvement of the cost_sort model [1], adding into the 
model the  number of columns to be sorted and the number of comparisons 
to be made  (through the stadistinct of the first column), I found out 
that one test doesn't like IncrementalSort in aggregates.sql:


'Utilise the ordering of merge join to avoid a Sort operation'

After discovering a little bit, I realised that although the idea that 
the IncrementalSort is better than the plain Sort is generally correct, 
the assumption that it is better all the time seems wrong.
For example, IncrementalSort, following an index, can choose a sort 
order with a low level of distinct values of the first columns, causing 
more comparisons - remember, we still sort tuples inside a group; or 
using an index on multiple columns having a need in just the first 
column and additional Sort on many others from the heap, etc.


Of course, I provide highly skewed cases and Sort + SeqScan basically 
will resolve the issue. But I think you may discover both possible sort 
ways; don't heuristically give a chance only to IncrementalSort. At 
least, IndexScan can beat SeqScan because of the index clause selectivity.


[1] 
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com


--
regards, Andrei Lepikhov





Re: protocol-level wait-for-LSN

2024-10-28 Thread Tatsuo Ishii
> The patch adds a protocol extension called _pq_.wait_for_lsn as well
> as a libpq connection option wait_for_lsn to activate the same.  (Use
> e.g., psql -d 'wait_for_lsn=1'.)
> 
> With this protocol extension, two things are changed:
> 
> - The ReadyForQuery message sends back the current LSN.

If other protocol extension X tries to add something to the
ReadyForQuery message too, what would happen?
Currently ReadyForQuery message is like this:

Byte1('Z')
Int32
Byte1

With the wait_for_lsn extension, It becomes:

Byte1('Z')
Int32
Byte1
String

Suppose the X extension wants to extend like this:

Byte1('Z')
Int32
Byte1
Int32

It seems impossible to coexist both.

Does this mean once the wait_for_lsn extension is brought into the
frontend/backend protocol specification, no other extensions that touch
ReadyForQuery cannot be defined?

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David Rowley
On Wed, 23 Oct 2024 at 13:51, David G. Johnston
 wrote:
> Went with a slightly different wording that seems to flow better with the 
> xrefs I added between the two options.

-Enables or disables the query planner's use of index-scan plan
-types. The default is on.
+Enables or disables the query planner's use of all index-scan
related plan

I'm concerned about the wording "all index-scan related".  It's not
that clear if that would include Bitmap Index Scans or not. I think
it's better to explicitly mention index-only-scans to make it clear
which nodes are affected.

+types. The default is on. The
index-only-scan plan types
+can be independently disabled by setting 
+to off.

I wondered if it's better to reference the enable_indexonlyscan GUC
here rather than document what enable_indexonlyscan does from the
enable_indexscan docs. Maybe just a "Also see enable_indexonlyscans."
could be added?

-The default is on.
+The default is on. However, this setting
has no effect if
+ is set to
off.

Could we just add "The  setting
must also be enabled to have the query planner consider
index-only-scans"?

I've attached that in patch form.

David
Title: 19.7. Query Planning

19.7. Query PlanningPrev UpChapter 19. Server ConfigurationHome Next19.7. Query Planning #19.7.1. Planner Method Configuration19.7.2. Planner Cost Constants19.7.3. Genetic Query Optimizer19.7.4. Other Planner Options19.7.1. Planner Method Configuration #
   These configuration parameters provide a crude method of
   influencing the query plans chosen by the query optimizer. If
   the default plan chosen by the optimizer for a particular query
   is not optimal, a temporary solution is to use one
   of these configuration parameters to force the optimizer to
   choose a different plan.
   Better ways to improve the quality of the
   plans chosen by the optimizer include adjusting the planner cost
   constants (see Section 19.7.2),
   running ANALYZE manually, increasing
   the value of the default_statistics_target configuration parameter,
   and increasing the amount of statistics collected for
   specific columns using ALTER TABLE SET
   STATISTICS.
  enable_async_append (boolean)
  
   #
Enables or disables the query planner's use of async-aware
append plan types. The default is on.
   enable_bitmapscan (boolean)
  
  
   #
Enables or disables the query planner's use of bitmap-scan plan
types. The default is on.
   enable_gathermerge (boolean)
  
   #
Enables or disables the query planner's use of gather
merge plan types. The default is on.
   enable_group_by_reordering (boolean)
  
   #
Controls if the query planner will produce a plan which will provide
GROUP BY keys sorted in the order of keys of
a child node of the plan, such as an index scan.  When disabled, the
query planner will produce a plan with GROUP BY
keys only sorted to match the ORDER BY clause,
if any. When enabled, the planner will try to produce a more
efficient plan. The default value is on.
   enable_hashagg (boolean)
  
   #
Enables or disables the query planner's use of hashed
aggregation plan types. The default is on.
   enable_hashjoin (boolean)
  
   #
Enables or disables the query planner's use of hash-join plan
types. The default is on.
   enable_incremental_sort (boolean)
  
   #
Enables or disables the query planner's use of incremental sort steps.
The default is on.
   enable_indexscan (boolean)
  
  
   #
Enables or disables the query planner's use of index-scan and
index-only-scan plan types.  The default is on.
Also see enable_indexonlyscan.
   enable_indexonlyscan (boolean)
  
   #
Enables or disables the query planner's use of index-only-scan plan
types (see Section 11.9).
The default is on.  The
enable_indexscan setting must also be
enabled to have the query planner consider index-only-scans.
   enable_material (boolean)
  
   #
Enables or disables the query planner's use of materialization.
It is impossible to suppress materialization entirely,
but turning this variable off prevents the planner from inserting
materialize nodes except in cases where it is required for correctness.
The default is on.
   enable_memoize (boolean)
  
   #
Enables or disables the query planner's use of memoize plans for
caching results from parameterized scans inside nested-loop joins.
This plan type allows scans to the underlying plans to be skipped when
the results for the current parameters are already in the cache.  Less
comm

Re: Considering fractional paths in Append node

2024-10-28 Thread Andy Fan
Nikita Malakhov  writes:

> Hi,
>
> Andy, thank you, I've checked this thread out along with run-time
> partition pruning.
I'm not sure the relationshipe between this topic and run-time
partitioin pruning..

> I've spend some time hovering on the tuple_fraction field usage and would 
> disagree
> with you on this topic - it is already used on the RelOptInfo level later on, 
> in
> generate_orderedappend_paths()

Looks you are right that root->tuple_fraction has been used on RelOptInfo
level in the generate_orderedappend_paths(). But we also tried to
use not in the RelOptInfo level such as set_subquery_pathlist. See..

"""
/*
 * We can safely pass the outer tuple_fraction down to the subquery if the
 * outer level has no joining, aggregation, or sorting to do. Otherwise
 * we'd better tell the subquery to plan for full retrieval. (XXX This
 * could probably be made more intelligent ...)
 */
"""

I'm not sure the "more intelligent" would be just use it directly. 

So I'm not saying we can't do this, just that the facts are:
(a).  root->tuple_fraction are not exactly same as RelOptInfo's
tuple_fraction.
(b).  We have used root->tuple_fraction in RelOptInfo in some cases and
also tried to not use it in some other case (and only use it under some
situation similar like what I did before).

Looks different committers have different opinion on this. 

-- 
Best Regards
Andy Fan





Questions About TODO: Issuing NOTICEs for row count differences in EXPLAIN ANALYZE

2024-10-28 Thread KAZAR Ayoub
Hello Hackers,
I am currently looking into the following TODO item, "Have EXPLAIN ANALYZE
issue NOTICE messages when the estimated and actual row counts differ by a
specified percentage."
What's the current status of this TODO, and is there any prior discussion
or rationale behind it ?
Specifically:
- How can we come up with a percentage for row count differences (fixed
value or dynamic)?
- Should we consider a configurable param for users ?
- Is there anything extra to consider ?

Thank you.


Re: On disable_cost

2024-10-28 Thread David Rowley
On Sat, 19 Oct 2024 at 01:09, Laurenz Albe  wrote:
> Here is my attempt on that paragraph:
>
>   When using the enable/disable flags to disable plan node types, many of
>   the flags only discourage the use of the corresponding plan node and don't
>   outright disallow the planner's ability to use the plan node type.
>   Otherwise, certain queries could not be executed for lack of an alternative
>   to using a disabled plan node.  As a consequence, it is possible that the
>   planner chooses a plan using a node that has been disabled.  When this
>   happens, the EXPLAIN output will indicate this fact.

I think that looks pretty good. However, I would like to keep the part
about the possibility of disabled nodes still being used is on
purpose. Mostly just to make it clear that it's not a bug. We get so
many false bug reports that I feel it's worthwhile mentioning that
explicitly.

Maybe since you dropped that sentence to shorten the paragraph, we
could instead just drop the "Otherwise, certain" sentence.

Also, the concern about using "this". How about we just write "When
the resulting plan contains a disabled node, the
EXPLAIN output will indicate this fact.", which
makes that self-contained.

That becomes:

When using the enable/disable flags to disable plan node types, many of
the flags only discourage the use of the corresponding plan node and don't
outright disallow the planner's ability to use the plan node type.  This
is by design so that the planner still maintains the ability to form a
plan for a given query.  When the resulting plan contains a disabled node,
the EXPLAIN output will indicate this fact.

David


disabled_docs_v3.patch
Description: Binary data


Re: pgsql: Implement pg_wal_replay_wait() stored procedure

2024-10-28 Thread Alexander Korotkov
On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas  wrote:
>
> On 25/10/2024 14:56, Alexander Korotkov wrote:
> > I see that pg_wal_replay_wait_status() might look weird, but it seems
> > to me like the best of feasible solutions.
>
> I haven't written many procedures, but our docs say:
>
>  > Procedures do not return a function value; hence CREATE PROCEDURE
> lacks a RETURNS clause. However, procedures can instead return data to
> their callers via output parameters.
>
> Did you consider using an output parameter?

Yes I did consider them and found two issues.
1) You still need to pass something to them.  And that couldn't be
default values.  That's a bit awkward.
2) Usage of them causes extra snapshot to be held.
I'll recheck if it's possible to workaround any of these two.

> > Given that
> > pg_wal_replay_wait() procedure can't work concurrently to a query
> > involving pg_wal_replay_wait_status() function, I think
> > pg_wal_replay_wait_status() should be stable and parallel safe.
>
> If you call pg_wal_replay_wait() in the backend process, and
> pg_wal_replay_wait_status() in a parallel worker process, it won't
> return the result of the wait. Probably not what you'd expect. So I'd
> argue that it should be parallel unsafe.

Oh, sorry.  You're absolutely correct.  That should be parallel unsafe.

> > This is the brief answer.  I will be able to come back with more
> > details on Monday.
>
> Thanks. A few more minor issues I spotted while playing with this:
>
> - If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps
> around and doesn't wait at all
> - You can pass NULLs as arguments. That should probably not be allowed,
> or we need to document what it means.
>
> This is disappointing:
>
> > postgres=# set default_transaction_isolation ='repeatable read';
> > SET
> > postgres=# call pg_wal_replay_wait('0/55DA24F');
> > ERROR:  pg_wal_replay_wait() must be only called without an active or 
> > registered snapshot
> > DETAIL:  Make sure pg_wal_replay_wait() isn't called within a transaction 
> > with an isolation level higher than READ COMMITTED, another procedure, or a 
> > function.
>
> Is there any way we could make that work? Otherwise, the feature just
> basically doesn't work if you use repeatable read.

Thank you for catching this.  The last one is really disappointing.
I'm exploring on what could be done there.

--
Regards,
Alexander Korotkov
Supabase




Re: MergeAppend could consider sorting cheapest child path

2024-10-28 Thread Nikita Malakhov
Hi!

I've checked this thread and examples in it, and do not see stable
improvements
in base tests. Sometimes base tests are considerably slower with patch,
like:


explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
  QUERY PLAN
--
 Limit  (cost=0.46..19.90 rows=10 width=16) (actual time=0.007..0.008
rows=0 loops=1)
   ->  Merge Join  (cost=0.46..181.24 rows=93 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
 Merge Cond: (t1.b = t2.b)
 ->  Merge Append  (cost=0.17..90.44 rows=1851 width=16) (actual
time=0.006..0.007 rows=0 loops=1)
   Sort Key: t1.b
   ->  Sort  (cost=0.01..0.02 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=1)
 Sort Key: t1_1.b
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on matest0 t1_1  (cost=0.00..0.00 rows=1
width=16) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using matest1i on matest1 t1_2
 (cost=0.15..71.90 rows=1850 width=16) (actual time=0.002..0.002 rows=0
loops=1)
 ->  Materialize  (cost=0.29..84.81 rows=10 width=4) (never
executed)
   ->  Merge Append  (cost=0.29..84.78 rows=10 width=4) (never
executed)
 Sort Key: t2.b
 ->  Index Scan using matest0i on matest0 t2_1
 (cost=0.12..8.14 rows=1 width=4) (never executed)
   Filter: (c = d)
 ->  Index Scan using matest1i on matest1 t2_2
 (cost=0.15..76.53 rows=9 width=4) (never executed)
   Filter: (c = d)
 Planning Time: 0.252 ms
 Execution Time: 0.048 ms
(19 rows)

explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
  QUERY PLAN
--
 Limit  (cost=0.57..20.88 rows=10 width=16) (actual time=0.004..0.004
rows=0 loops=1)
   ->  Merge Join  (cost=0.57..189.37 rows=93 width=16) (actual
time=0.003..0.004 rows=0 loops=1)
 Merge Cond: (t1.b = t2.b)
 ->  Merge Append  (cost=0.29..98.56 rows=1851 width=16) (actual
time=0.002..0.003 rows=0 loops=1)
   Sort Key: t1.b
   ->  Index Scan using matest0i on matest0 t1_1
 (cost=0.12..8.14 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using matest1i on matest1 t1_2
 (cost=0.15..71.90 rows=1850 width=16) (actual time=0.001..0.001 rows=0
loops=1)
 ->  Materialize  (cost=0.29..84.81 rows=10 width=4) (never
executed)
   ->  Merge Append  (cost=0.29..84.78 rows=10 width=4) (never
executed)
 Sort Key: t2.b
 ->  Index Scan using matest0i on matest0 t2_1
 (cost=0.12..8.14 rows=1 width=4) (never executed)
   Filter: (c = d)
 ->  Index Scan using matest1i on matest1 t2_2
 (cost=0.15..76.53 rows=9 width=4) (never executed)
   Filter: (c = d)
 Planning Time: 0.278 ms
 Execution Time: 0.025 ms
(16 rows)

(patched)
explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
  QUERY PLAN
--
 Limit  (cost=0.46..19.90 rows=10 width=16) (actual time=0.007..0.008
rows=0 loops=1)
   ->  Merge Join  (cost=0.46..181.24 rows=93 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
 Merge Cond: (t1.b = t2.b)
 ->  Merge Append  (cost=0.17..90.44 rows=1851 width=16) (actual
time=0.006..0.007 rows=0 loops=1)
   Sort Key: t1.b
   ->  Sort  (cost=0.01..0.02 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=1)
 Sort Key: t1_1.b
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on matest0 t1_1  (cost=0.00..0.00 rows=1
width=16) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using matest1i on matest1 t1_2
 (cost=0.15..71.90 rows=1850 width=16) (actual time=0.002..0.002 rows=0
loops=1)
 ->  Materialize  (cost=0.29..84.81 rows=10 width=4) (never
executed)
   ->  Merge Append  (cost=0.29..84.78 rows=10 width=4) (never
executed)
 Sort Key: t2.b
 ->  Index Scan using matest0i on matest0 t2_1
 (cost=0.12..8.14 rows=1 width=4) (never executed)
   Filter: (c = d)
  

Re: Considering fractional paths in Append node

2024-10-28 Thread Nikita Malakhov
Hi,

Andy, thank you, I've checked this thread out along with run-time partition
pruning.
I've spend some time hovering on the tuple_fraction field usage and would
disagree
with you on this topic - it is already used on the RelOptInfo level later
on, in
generate_orderedappend_paths()
I mean the following piece:
if (root->tuple_fraction > 0)
{
double path_fraction = (1.0 / root->tuple_fraction);
Path cheapest_consider_fraction;

cheapest_fractional =
get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
pathkeys, NULL, path_fraction);
...

function, so it does not seem incorrect to use its value for a single
relation in subquery -
I agree that we do not have accurate estimation at this level, but we could
use the one
we already have.
I've also tried hard to find an example where this patch could break
something,
but without success.

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-28 Thread Nathan Bossart
Committed.

-- 
nathan




Re: Conflict Detection and Resolution

2024-10-28 Thread Diego Fronza
Hello hackers,

Hey I'm Diego and I do work for Percona and started to work on PostgreSQL
and I would like to contribute to the project moving forward.

I have been following this thread since the beginning, but due to my
limited knowledge of the overall code structure, my first review of the
provided patches was more focused on validating the logic and general flow.

I have been testing the provided patches and so far the only issue I have
is the one reported about DirtySnapshot scans over a B-tree with parallel
updates, which may skip/not find some records.

That said, I'd like to know if it's worthwhile pulling the proposed fix on
[0] and validating/updating the code to fix the issue or if there are other
better solutions being discussed?

Thanks for your attention,
Diego

[0]:
https://www.postgresql.org/message-id/flat/cantu0oizitbm8+wdtkktmzv0rhgbroygwwqsqw+mzowpmk-...@mail.gmail.com#74f5f05594bb6f10b1d882a1ebce377c

On Mon, Oct 21, 2024 at 2:04 AM shveta malik  wrote:

> On Fri, Oct 18, 2024 at 4:30 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, October 9, 2024 2:34 PM shveta malik <
> shveta.ma...@gmail.com> wrote:
> > >
> > > On Wed, Oct 9, 2024 at 8:58 AM shveta malik 
> > > wrote:
> > > >
> > > > On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond
> > >  wrote:
> > > > >
> > > >
> > >
> > > Please find few comments on v14-patch004:
> > >
> > > patch004:
> > > 1)
> > > GetConflictResolver currently errors out when the resolver is
> last_update_wins
> > > and track_commit_timestamp is disabled. It means every conflict
> resolution
> > > with this resolver will keep on erroring out. I am not sure if we
> should emit
> > > ERROR here. We do emit ERROR when someone tries to configure
> > > last_update_wins but track_commit_timestamp is disabled. I think that
> should
> > > suffice. The one in GetConflictResolver can be converted to WARNING
> max.
> > >
> > > What could be the side-effect if we do not emit error here? In such a
> case, the
> > > local timestamp will be 0 and remote change will always win.
> > > Is that right? If so, then if needed, we can emit a warning saying
> something like:
> > > 'track_commit_timestamp is disabled and thus remote change is applied
> > > always.'
> > >
> > > Thoughts?
> >
> > I think simply reporting a warning and applying remote changes without
> further
> > action could lead to data inconsistencies between nodes. Considering the
> > potential challenges and time required to recover from these
> inconsistencies, I
> > prefer to keep reporting errors, in which case users have an opportunity
> to
> > resolve the issue by enabling track_commit_timestamp.
> >
>
> Okay, makes sense. We should raise ERROR then.
>
> thanks
> Shveta
>
>
>


Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Peter Smith
Here are my review comments for v44-0001.

==
doc/src/sgml/ref/create_publication.sgml

1.
-  When a column list is specified, only the named columns are replicated.
+  When a column list is specified, all columns (except generated columns)
+  of the table are replicated.
   If no column list is specified, all columns of the table are replicated
   through this publication, including any columns added later. It has no

Huh? This seems very wrong.

I think it should have been like:
When a column list is specified, only the named columns are
replicated. If no column list is specified, all table columns (except
generated columns) are replicated...

==
src/backend/replication/logical/proto.c

2.
+bool
+logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+{
+ if (att->attisdropped)
+ return false;
+
+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (!columns && att->attgenerated)
+ return false;
+
+ /*
+ * Check if a column is covered by a column list.
+ */
+ if (columns && !bms_is_member(att->attnum, columns))
+ return false;
+
+ return true;
+}

I thought this could be more simply written as:

{
if (att->attisdropped)
  return false;

/* If a column list was specified only publish the specified columns. */
if (columns)
  return bms_is_member(att->attnum, columns);

/* If a column list was not specified publish everything except
generated columns. */
return !att->attgenerated;
}

==
src/backend/replication/pgoutput/pgoutput.c

3.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
+ continue;
+
+ if (att->attgenerated)
+ {
+ if (bms_is_member(att->attnum, cols))
+ gencolpresent = true;
+
  continue;
+ }
+

  nliveatts++;
  }

  /*
- * If column list includes all the columns of the table,
- * set it to NULL.
+ * If column list includes all the columns of the table
+ * and there are no generated columns, set it to NULL.
  */
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts && !gencolpresent)
  {
  bms_free(cols);
  cols = NULL;
~

That code still looks strange to me. I think that unconditional
'continue' for attgenerated is breaking the meaning of 'nliveattrs'
(which I take as meaning 'count-of-the-attrs-to-be-published').

AFAICT the code should be more like this:

if (att->attgenerated)
{
  /* Generated cols are skipped unless they are present in a column list. */
  if (!bms_is_member(att->attnum, cols))
continue;

  gencolpresent = true;
}

==
src/test/regress/sql/publication.sql

4.
 ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

+-- ok: generated column "d" can be in the list too
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

Maybe you can change this test to do "SET TABLE testpub_tbl5 (a,d);"
instead of ADD TABLE, so then you can remove the earlier DROP and DROP
the table only once.

==
src/test/subscription/t/031_column_list.pl

5.
+# TEST: Dropped columns are not considered for the column list, and generated
+# columns are not replicated if they are not explicitly included in the column
+# list. So, the publication having a column list except for those columns and a
+# publication without any column (aka all columns as part of the columns list)
+# are considered to have the same column list.

Hmm. I don't think this wording is quite right "without any column".
AFAIK the original intent of this test was to prove only that
dropped/generated columns were ignored for the NULL column list logic.

That last sentence maybe should say more like:

So a publication with a column list specifying all table columns
(excluding only dropped and generated columns) is considered to be the
same as a publication that has no column list at all for that table.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David G. Johnston
On Mon, Oct 28, 2024 at 3:54 PM David Rowley  wrote:

>
> I've attached that in patch form.
>
>
 -Enables or disables the query planner's use of index-scan plan
-types. The default is on.
+Enables or disables the query planner's use of index-scan and
+index-only-scan plan types.  The default is on.
+Also see .

I think the original wording "index-scan plan types" is what is confusing
me.  The plural types is turning index-scan plan into a category of plans
rather than the single plan type "index scan".

Your proposed wording actually (accidentally?) fixes this because now the
plural types actually refers to two individual plan nodes, "index scan" and
"index-only scan".

The hyphenation still reads a bit odd but ok.

I am ok with this revision (and the patch as a whole) I suppose but I still
feel like something is missing here.  Though probably that something would
fit better in an overview page rather than trying to get the settings to
explain all this to the reader.

David J.


Re: Consider the number of columns in the sort cost model

2024-10-28 Thread Andrei Lepikhov

On 10/28/24 16:48, Alena Rybakina wrote:

On 23.10.2024 04:39, Andrei Lepikhov wrote:

On 15/10/2024 12:15, David Rowley wrote:
And the last patch is a demo of how I'm going to use the previous 
three patches and add one more strategy to improve the order of 
columns in the GROUP-BY clause.


To be honest, I didn’t find information about this in the code, but did 
I understand everything correctly?

Yes


2. I noticed that statistics of distinct values ​​are calculated several 
times. Maybe I miss something, but this can be avoided if these 
statistics can be saved after calculation. For example, I saw that it is 
possible that you calculate the same statistic information for the same 
equivalence members in cost_incremental_sort and identify_sort_ecmember. 
Is it possible to store information in a structure and use it later?
Hmm, I don't see multiple calculations. em_distinct has made 
specifically for this sake. Can you provide specific case and code lines?


3. I think you should initialize the variable ndist in your patch. I 
faced the compile complaining during your code compilation.


costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized 
[-Werror=maybe-uninitialized]

  6694 | em->em_ndistinct = ndist;
   | ~^
costsize.c:6680:33: note: ‘ndist’ was declared here
  6680 | double  ndist;
   | ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [: costsize.o] Error 1
I think you can just update your compiler. But I added the ndist 
initialisation to make more compilers happy :).



+        Assert (node != NULL);
+
          examine_variable(root, node, 0, &vardata);
          if (!HeapTupleIsValid(vardata.statsTuple))
              continue;
I don't think so. At least until you provide the case when the 
get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the 
perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly 
because of 828e94c. So, see the new version in the attachment.


--
regards, Andrei Lepikhov
From 5eb884cbbd9c2e356d5d855da46d7e62d101b8b9 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 29 Oct 2024 08:49:33 +0700
Subject: [PATCH v2 1/4] Stabilise incremental sort cost calculation.

Carefully identify a column/expression that can represent the path key in cost
calculation of specific sort operator. Columns may have different numbers of
distinct values. That's why the order of columns in the sort operation may
impact number of the comparison function's calls.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find an
expression that chooses the most negligible ndistinct value.

TODO: Filtering EC members, external to this sort operator is not a big deal.
But in that case it would be necessary to pass underlying relids to cost
calculation routine that would cause the API change. So, here we stay as simple
as possible.

Add into EquivalenceMember the number of distinct values - em_ndistinct.
It may be additionally used later in groups number estimations.
---
 src/backend/optimizer/path/costsize.c | 72 +--
 src/backend/optimizer/path/equivclass.c   |  1 +
 src/include/nodes/pathnodes.h |  2 +
 .../regress/expected/incremental_sort.out | 51 +
 src/test/regress/sql/incremental_sort.sql | 31 
 5 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..686d5883d1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -203,6 +203,8 @@ static int32 get_expr_width(PlannerInfo *root, const Node *expr);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+ EquivalenceClass *ec);
 
 
 /*
@@ -2052,22 +2054,21 @@ cost_incremental_sort(Path *path,
 	 */
 	foreach(l, pathkeys)
 	{
-		PathKey*key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		PathKey			   *key = (PathKey *) lfirst(l);
+		EquivalenceMember  *em = identify_sort_ecmember(root, key->pk_eclass);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars wi

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-28 Thread Michael Paquier
On Mon, Oct 28, 2024 at 07:17:28AM +, Bertrand Drouvot wrote:
> I think that we can not be 100% sure that the s1 wait will finish before the
> s2 detach (easy reproducible with gdb attached on s1 or an hardcoded sleep) 
> and
> that other OS could also report the test as failing for the same reason.

Yes, the only safe thing we can do in this test is to let the wakeup2
be last, as we are sure that the isolationtester is going to keep at
s2 to finish the call of the wakeup function before moving on with
checking the end of the wait.

> It's not ideal, but instead of removing this first permutation test what about
> adding a "sleep2" step in it (doing say, SELECT pg_sleep(1);) and call this
> new step before the detach2 one?

There is no real guarantee of stability.  Under a wait of N seconds,
we could still have environments where the wait() could remain stuck
more than N seconds between the moment the condition variable is woken
up and the result of the wait() is reported back to the client.  And
hardcoded sleeps make the test slower even on fast machines.

What we have here seems like just contention of Cirrus with the
FreeBSD hosts while there is much more stability with the linux hosts.
--
Michael


signature.asc
Description: PGP signature


Re: Alias of VALUES RTE in explain plan

2024-10-28 Thread Yasir
On Mon, Oct 28, 2024 at 8:16 PM Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > The patch looks good to me, except the name of the new member.
>
> >   CommonTableExpr *p_parent_cte; /* this query's containing CTE */
> > + Alias*p_parent_alias; /* parent's alias for this query */
>
> > the two "parent"s here mean different things and that might lead one
> > to assume that the p_parent_alias refers to alias of CTE. The comment
> > adds to the confusion since it mentions parent. How about renaming it
> > as p_outer_alias? or something which indicates alias of the outer
> > query?
>
> Hmm, I figured the two "parent" references do mean the same thing,
> ie the immediately surrounding syntactic construct.  While I won't
> fight hard about it, I don't see an advantage in naming the new
> field differently.  We could make the comment be
>
> /* outer level's alias for this query */


This seems ok to me.




> if that helps any.
>
> regards, tom lane
>


Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David Rowley
We don't seem to be agreeing on much here... :-(

On Tue, 29 Oct 2024 at 13:30, David G. Johnston
 wrote:
>
> On Mon, Oct 28, 2024 at 3:54 PM David Rowley  wrote:
>> I'm concerned about the wording "all index-scan related".  It's not
>> that clear if that would include Bitmap Index Scans or not.
>
>
> That was partially the point of writing "all" there - absent other 
> information, and seeing how index-only scans were treated, I presumed it was 
> indeed actually or effectively a switch for all.  If it is not it should be 
> made clear which node types with the word index in them are not affected.

I'm very much against mentioning which things are *not* affected by
settings. It doesn't seem like a very sustainable way to write
documentation.

>> I think
>> it's better to explicitly mention index-only-scans to make it clear
>> which nodes are affected.
>
> I hadn't considered Bitmap Index Scans but I would expect if you do not use 
> index scans then the ability to produce bitmaps from them would be precluded.
>
> I could see pointing out, in enable_bitmapscan, that enable_bitmapscan is 
> effectively disabled (for index inputs) when enable_indexscan is set to off.  
> Then, in enable_indexscan, add a "see also" to enable_bitmapscan with a brief 
> reason as well.

I don't follow this. enable_bitmapscan is completely independent from
enable_indexscan.

> Is there a listing of all node types produced by PostgreSQL (with the explain 
> output naming) along with which ones are affected by which enable_* knobs 
> (possibly multiple for something like Bitmap Index Scan)?

No. We purposefully do our best not to document executor nodes. The
enable_* GUCs is one place where it's hard to avoid.

>>
>> +types. The default is on. The
>> index-only-scan plan types
>> +can be independently disabled by setting > linkend="guc-enable-indexonlyscan"/>
>> +to off.
>>
>> I wondered if it's better to reference the enable_indexonlyscan GUC
>> here rather than document what enable_indexonlyscan does from the
>> enable_indexscan docs. Maybe just a "Also see enable_indexonlyscans."
>> could be added?
>
>
> I prefer to briefly explain why we advise the reader to go "see also" here.
>
>>
>> -The default is on.
>> +The default is on. However, this setting
>> has no effect if
>> + is set to
>> off.
>>
>> Could we just add "The  setting
>> must also be enabled to have the query planner consider
>> index-only-scans"?
>
>
> I'd like to stick with a conjunction there but agree the "must be enabled" 
> wording is preferrable, avoiding the double-negative.
>
> "The default is on, but the  setting must also be enabled."
>
> The 'to have the...' part seems to just be redundant.

I think it's confusing to include this as part of the mention of what
the default value is. The default value and enable_indexscans being
the master switch aren't at all related.

David




Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Amit Kapila
On Tue, Oct 29, 2024 at 7:44 AM Peter Smith  wrote:
>
> ==
> src/backend/replication/logical/proto.c
>
> 2.
> +bool
> +logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (!columns && att->attgenerated)
> + return false;
> +
> + /*
> + * Check if a column is covered by a column list.
> + */
> + if (columns && !bms_is_member(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> I thought this could be more simply written as:
>
> {
> if (att->attisdropped)
>   return false;
>
> /* If a column list was specified only publish the specified columns. */
> if (columns)
>   return bms_is_member(att->attnum, columns);
>
> /* If a column list was not specified publish everything except
> generated columns. */
> return !att->attgenerated;
> }
>

Your version is difficult to follow compared to what is proposed in
the current patch. It is a matter of personal choice, so I leave it to
the author (or others) which one they prefer. However, I suggest that
we add extra comments in the current patch where we return true at the
end of the function and also at the top of the function.

>
> ==
> src/test/regress/sql/publication.sql
>
> 4.
>  ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> +-- ok: generated column "d" can be in the list too
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (d);
> +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> Maybe you can change this test to do "SET TABLE testpub_tbl5 (a,d);"
> instead of ADD TABLE, so then you can remove the earlier DROP and DROP
> the table only once.
>

Yeah, we can do that if we want, but let's not add the dependency of
the previous test. Separate tests make it easier to extend the tests
in the future. Now, if it would have saved a noticeable amount of
time, then we could have considered it. Having said that, we can keep
both columns a and d in the column list.

> ==
> src/test/subscription/t/031_column_list.pl
>
> 5.
> +# TEST: Dropped columns are not considered for the column list, and generated
> +# columns are not replicated if they are not explicitly included in the 
> column
> +# list. So, the publication having a column list except for those columns 
> and a
> +# publication without any column (aka all columns as part of the columns 
> list)
> +# are considered to have the same column list.
>
> Hmm. I don't think this wording is quite right "without any column".
> AFAIK the original intent of this test was to prove only that
> dropped/generated columns were ignored for the NULL column list logic.
>
> That last sentence maybe should say more like:
>
> So a publication with a column list specifying all table columns
> (excluding only dropped and generated columns) is considered to be the
> same as a publication that has no column list at all for that table.
>

I think you are saying the same thing in slightly different words.
Both of those sound correct to me. So not sure if we get any advantage
by changing it.

-- 
With Regards,
Amit Kapila.




Re: Questions About TODO: Issuing NOTICEs for row count differences in EXPLAIN ANALYZE

2024-10-28 Thread David Rowley
On Tue, 29 Oct 2024 at 12:43, KAZAR Ayoub  wrote:
> I am currently looking into the following TODO item, "Have EXPLAIN ANALYZE 
> issue NOTICE messages when the estimated and actual row counts differ by a 
> specified percentage."
> What's the current status of this TODO, and is there any prior discussion or 
> rationale behind it ?

The status is that we don't have anything like that and I don't recall
it being mentioned that anyone is working on it.  Normally these items
only get added when there has been some discussion about it, but
normally that discussion gets linked along with the todo item. Clearly
that's not been done in this case.  I imagine the rationale is to make
it more clear when the estimates are off from the actual execution.

You might need to do some digging into the history of who added that
todo item and see if you can find any relevant discussion on hackers
around the time it was added.

> Specifically:
> - How can we come up with a percentage for row count differences (fixed value 
> or dynamic)?
> - Should we consider a configurable param for users ?
> - Is there anything extra to consider ?

The biggest thing to consider is if we'd want anything like this in
core PostgreSQL. It feels more like something additional tooling such
as explain.depesz.com would concern themselves with. I could also
imagine features along those lines in some sort of statistics advisor
contrib module. My personal view is that it would feel like a very
misplaced feature if we were to add only what the todo item describes
into core PostgreSQL. In any case, adding a NOTICE for this seems
horrible. Doing it that way means the information about the row
estimate's accuracy is very disconnected from the EXPLAIN line that it
belongs to.

Additionally, there are cases where we expect the actual and estimates
to be off, even with perfect statistics. Consider the Seq Scan in the
following:

postgres=# explain analyze select * from pg_class limit 1;
 QUERY PLAN
-
 Limit  (cost=0.00..0.04 rows=1 width=273) (actual time=0.035..0.036
rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..18.15 rows=415 width=273)
(actual time=0.033..0.033 rows=1 loops=1)

I don't think we'd want false alarms for cases like that.

David




Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David G. Johnston
On Mon, Oct 28, 2024 at 3:54 PM David Rowley  wrote:

> On Wed, 23 Oct 2024 at 13:51, David G. Johnston
>  wrote:
> > Went with a slightly different wording that seems to flow better with
> the xrefs I added between the two options.
>
> -Enables or disables the query planner's use of index-scan plan
> -types. The default is on.
> +Enables or disables the query planner's use of all index-scan
> related plan
>
> I'm concerned about the wording "all index-scan related".  It's not
> that clear if that would include Bitmap Index Scans or not.


That was partially the point of writing "all" there - absent other
information, and seeing how index-only scans were treated, I presumed it
was indeed actually or effectively a switch for all.  If it is not it
should be made clear which node types with the word index in them are not
affected.

I think
> it's better to explicitly mention index-only-scans to make it clear
> which nodes are affected.
>

I hadn't considered Bitmap Index Scans but I would expect if you do not use
index scans then the ability to produce bitmaps from them would be
precluded.

I could see pointing out, in enable_bitmapscan, that enable_bitmapscan is
effectively disabled (for index inputs) when enable_indexscan is set to
off.  Then, in enable_indexscan, add a "see also" to enable_bitmapscan with
a brief reason as well.

Is there a listing of all node types produced by PostgreSQL (with the
explain output naming) along with which ones are affected by which enable_*
knobs (possibly multiple for something like Bitmap Index Scan)?


> +types. The default is on. The
> index-only-scan plan types
> +can be independently disabled by setting  linkend="guc-enable-indexonlyscan"/>
> +to off.
>
> I wondered if it's better to reference the enable_indexonlyscan GUC
> here rather than document what enable_indexonlyscan does from the
> enable_indexscan docs. Maybe just a "Also see enable_indexonlyscans."
> could be added?
>

I prefer to briefly explain why we advise the reader to go "see also" here.


> -The default is on.
> +The default is on. However, this setting
> has no effect if
> + is set to
> off.
>
> Could we just add "The  setting
> must also be enabled to have the query planner consider
> index-only-scans"?
>

I'd like to stick with a conjunction there but agree the "must be enabled"
wording is preferrable, avoiding the double-negative.

"The default is on, but the  setting must also be enabled."

The 'to have the...' part seems to just be redundant.

David J.


Re: Statistics Import and Export

2024-10-28 Thread Jeff Davis
On Sun, 2024-10-27 at 14:00 +0300, Alexander Lakhin wrote:
> Please look at the following seemingly atypical behavior of the new
> functions:

...

> SELECT pg_restore_attribute_stats(
>    'relation', 'test'::regclass,
>    'attname', 'id'::name,
>    'inherited', false
> ) FROM generate_series(1, 2);
> ERROR:  XX000: tuple already updated by self

Thank you for the report!

Attached a patch to add calls to CommandCounterIncrement().

Regards,
Jeff Davis

From ccff3df8b2d6f0c139a39e5aef8721b4480bdbd3 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 28 Oct 2024 18:16:09 -0700
Subject: [PATCH] Add missing CommandCounterIncrement() in stats import
 functions.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/98b2fcf0-f701-369e-d63d-6be9739ce...@gmail.com
---
 src/backend/statistics/attribute_stats.c | 11 ---
 src/backend/statistics/relation_stats.c  |  2 ++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/statistics/attribute_stats.c b/src/backend/statistics/attribute_stats.c
index af61fd79e4..4ae0722b78 100644
--- a/src/backend/statistics/attribute_stats.c
+++ b/src/backend/statistics/attribute_stats.c
@@ -752,6 +752,8 @@ upsert_pg_statistic(Relation starel, HeapTuple oldtup,
 	}
 
 	heap_freetuple(newtup);
+
+	CommandCounterIncrement();
 }
 
 /*
@@ -762,6 +764,7 @@ delete_pg_statistic(Oid reloid, AttrNumber attnum, bool stainherit)
 {
 	Relation	sd = table_open(StatisticRelationId, RowExclusiveLock);
 	HeapTuple	oldtup;
+	bool		result = false;
 
 	/* Is there already a pg_statistic tuple for this attribute? */
 	oldtup = SearchSysCache3(STATRELATTINH,
@@ -773,12 +776,14 @@ delete_pg_statistic(Oid reloid, AttrNumber attnum, bool stainherit)
 	{
 		CatalogTupleDelete(sd, &oldtup->t_self);
 		ReleaseSysCache(oldtup);
-		table_close(sd, RowExclusiveLock);
-		return true;
+		result = true;
 	}
 
 	table_close(sd, RowExclusiveLock);
-	return false;
+
+	CommandCounterIncrement();
+
+	return result;
 }
 
 /*
diff --git a/src/backend/statistics/relation_stats.c b/src/backend/statistics/relation_stats.c
index 5a2aabc921..ed5dea2e05 100644
--- a/src/backend/statistics/relation_stats.c
+++ b/src/backend/statistics/relation_stats.c
@@ -171,6 +171,8 @@ relation_statistics_update(FunctionCallInfo fcinfo, int elevel)
 	/* release the lock, consistent with vac_update_relstats() */
 	table_close(crel, RowExclusiveLock);
 
+	CommandCounterIncrement();
+
 	return result;
 }
 
-- 
2.34.1



Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David Rowley
On Tue, 29 Oct 2024 at 14:41, David G. Johnston
 wrote:
>
> On Mon, Oct 28, 2024 at 3:54 PM David Rowley  wrote:
>  -Enables or disables the query planner's use of index-scan plan
> -types. The default is on.
> +Enables or disables the query planner's use of index-scan and
> +index-only-scan plan types.  The default is on.
> +Also see .
>
> I think the original wording "index-scan plan types" is what is confusing me. 
>  The plural types is turning index-scan plan into a category of plans rather 
> than the single plan type "index scan".
>
> Your proposed wording actually (accidentally?) fixes this because now the 
> plural types actually refers to two individual plan nodes, "index scan" and 
> "index-only scan".

I can't really vouch for the original wording as I didn't write it. I
agree the original use of "types" as a plural is strange and it's not
all that clear what that includes. Perhaps it was an attempt to mean
index and index-only scans

> The hyphenation still reads a bit odd but ok.

I'm not sure where the hyphenated form of "index-scan" comes from and
I admit that I blindly copied that form when I wrote
"index-only-scans". I'd much prefer we used Index
Scan and Index Only Scan so it could more
easily be matched up to what's shown in EXPLAIN. I don't think it's up
to this patch to change that, so I've just copied the existing form. I
was also warded off using the node name from EXPLAIN in [1], and on
checking the validity of the complaint, it seems valid.

> I am ok with this revision (and the patch as a whole) I suppose but I still 
> feel like something is missing here.  Though probably that something would 
> fit better in an overview page rather than trying to get the settings to 
> explain all this to the reader.

Thanks. I'll go make it happen. It seems worthy of a backpatch because
it seems equally as applicable there, plus to maintain consistency.

For the part that seems missing... I'm not sure it's a great excuse,
but we've often been quite bad at updating the documentation when
making changes to the executor or EXPLAIN.  Tom fixed a bunch of stuff
in 5caa05749 which was outdated.  I think if we wanted to try and do a
better job of documenting plan choices and EXPLAIN output, we'd need
to consider if the said documentation is worth the additional
maintenance burden. It might be quite hard to decide that unless
someone went and wrote the documentation first so that we could
consider it on its own merit. Whoever does that would have to be
willing to have the whole work rejected if we decided it wasn't worth
the trouble. It seems like a bit of a thankless task and I'm not
motivated to do it. Your pain threshold might be higher than mine,
however.

David

[1] 
https://www.postgresql.org/message-id/ccbe8ab940da76d388af7fc3fd169f1dedf751f6.ca...@cybertec.at




Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-10-28 Thread Jingtang Zhang
Hi~

Sorry for multiple comments in separate mail. Just found that the initialization
seems redundant since we have used palloc0?

> +istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
> +istate->bistate = NULL;
> +istate->mistate = NULL;

---
Regards, Jingtang




Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

2024-10-28 Thread jian he
On Thu, Oct 24, 2024 at 3:01 PM Tender Wang  wrote:
>
> I feel that it's hard only to use one struct(for example, X), which just 
> calls equal(X, expr)
> can check both the expression match and the collation match.
>

in RelOptInfo->partexprs, maybe we should mention that the partition
key collation is stored
in RelOptInfo->part_scheme, not here.

> Maybe we should add another collation match checks in 
> match_clause_to_partition_key(), like
> partition pruning logic does.
>
in match_clause_to_partition_key
we already have

else if (IsA(clause, OpExpr) &&
list_length(((OpExpr *) clause)->args) == 2)
{
/*
 * Partition key match also requires collation match.  There may be
 * multiple partkeys with the same expression but different
 * collations, so failure is NOMATCH.
 */
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
if (!equal(leftop, partkey) ||
!PartCollMatchesExprColl(partcoll, saop->inputcollid))
return PARTCLAUSE_NOMATCH;
}
So I think match_clause_to_partition_key handling collation is fine.

I think the problem is match_expr_to_partition_keys
don't have a collation related check.

CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
collate case_insensitive);
CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
LIST(c collate ignore_accents);

Our partition-wise join is based on Equi-join [1].
In some cases,column and partitionkey collation are different,
but if these two collations are deterministic, then texteq should work
as expected.
So I think, pagg_join3 can do partition-wise join,
I think pagg_join2 can do partition-wise join also.

we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
do partition-wise join (join with themself),
or we can let pagg_join2, pagg_join3 do partition-wise join (join with
themself).


POC attached, will let pagg_join2, pagg_join3 do partition-wise join.


[1] https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..6214d01794 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -74,7 +74,7 @@ static bool have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
    RelOptInfo *rel1, RelOptInfo *rel2,
    JoinType jointype, List *restrictlist);
 static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
-		 bool strict_op);
+		 bool strict_op, bool *coll_inderministic);
 static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 			RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 			JoinType jointype);
@@ -2104,6 +2104,7 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
 		Expr	   *expr1;
 		Expr	   *expr2;
 		bool		strict_op;
+		bool		coll_inderministic = false;
 		int			ipk1;
 		int			ipk2;
 
@@ -2167,10 +2168,11 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
 		 * Only clauses referencing the partition keys are useful for
 		 * partitionwise join.
 		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op, &coll_inderministic);
 		if (ipk1 < 0)
 			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+
+		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op, &coll_inderministic);
 		if (ipk2 < 0)
 			continue;
 
@@ -2181,6 +2183,10 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
 		if (ipk1 != ipk2)
 			continue;
 
+		/* if either collation is inderministic, cannot do partitionwise join */
+		if (coll_inderministic)
+			return false;
+
 		/* Ignore clause if we already proved these keys equal. */
 		if (pk_known_equal[ipk1])
 			continue;
@@ -2296,9 +2302,12 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
  * strict_op must be true if the expression will be compared with the
  * partition key using a strict operator.  This allows us to consider
  * nullable as well as nonnullable partition keys.
+ * coll_inderministic return true if exprCollation(expr) is inderministic. if
+ * expr is inderministic, that means same value with different apperance can
+ * live in different partition. In that case, we cannot do partition-wise join.
  */
 static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op, bool *coll_inderministic)
 {
 	int			cnt;
 
@@ -2319,7 +2328,15 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 		foreach(lc, rel->partexprs[cnt])
 		{
 			if (equal(l

Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Amit Kapila
On Tue, Oct 29, 2024 at 11:19 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> 01. fetch_remote_table_info()
>
> `bool *remotegencolpresent` is accessed unconditionally, but it can cause 
> crash
> if NULL is passed to the function. Should we add an Assert to verify it?
>

This is a static function being called from just one place, so don't
think this is required.

> 02. fetch_remote_table_info()
>
> ```
> +if (server_version >= 18)
> +*remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, 
> &isnull));
> +
> ```
>
> Can we add Assert(!isnull) like other parts?
>
> 03. fetch_remote_table_info()
>
> Also, we do not have to reach here once *remotegencolpresent becomes true.
> Based on 02 and 03, how about below?
>
> ```
> if (server_version >= 18 && !(*remotegencolpresent))
> {
> *remotegencolpresent |= 
> DatumGetBool(slot_getattr(slot, 5, &isnull));
> Assert(!isnull);
> }
> ```
>

Yeah, we can follow this suggestion but better to add a comment for the same.

-- 
With Regards,
Amit Kapila.




RE: Pgoutput not capturing the generated columns

2024-10-28 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for updating the patch! Here are my comments for v44.

01. fetch_remote_table_info()

`bool *remotegencolpresent` is accessed unconditionally, but it can cause crash
if NULL is passed to the function. Should we add an Assert to verify it?

02. fetch_remote_table_info()

```
+if (server_version >= 18)
+*remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, 
&isnull));
+
```

Can we add Assert(!isnull) like other parts?

03. fetch_remote_table_info()

Also, we do not have to reach here once *remotegencolpresent becomes true.
Based on 02 and 03, how about below?

```
if (server_version >= 18 && !(*remotegencolpresent))
{
*remotegencolpresent |= DatumGetBool(slot_getattr(slot, 
5, &isnull));
Assert(!isnull);
}
```

04. pgoutput_column_list_init()

+if (att->attgenerated)
+{
+if (bms_is_member(att->attnum, cols))
+gencolpresent = true;
+
 continue;
+}

I'm not sure it is correct. Why do you skip the generated column even when it 
is in
the column list? Also, can you add comments what you want to do?

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Peter Smith
Here are my review comments for patch v44-0002.

==
Commit message.

1.
The commit message is missing.

==
src/backend/replication/logical/tablesync.c

fetch_remote_table_info:

2.
+fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation *lrel,
+ List **qual, bool *remotegencolpresent)

The name 'remotegencolpresent' sounds like it means a generated col is
present in the remote table, but don't we only care when it is being
published? So, would a better parameter name be more like
'remote_gencol_published'?

~~~

3.
Would it be better to introduce a new human-readable variable like:
bool check_for_published_gencols = (server_version >= 18);

because then you could use that instead of having the 18 check in
multiple places.

~~~

4.
-   lengthof(attrRow), attrRow);
+   server_version >= 18 ? lengthof(attrRow) : lengthof(attrRow) -
1, attrRow);

If you wish, that length calculation could be written more concisely like:
lengthof(attrow) - (server_version >= 18 ? 0 : 1)

~~~

5.
+ if (server_version >= 18)
+ *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
+

Should this also say Assert(!isnull)?

==
src/test/subscription/t/031_column_list.pl

6.
+ qq(0|1),
+ 'replication with generated columns in column list');

Perhaps this message should be worded slightly differently, to
distinguish it from the "normal" replication message.

/replication with generated columns in column list/initial replication
with generated columns in column list/

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

2024-10-28 Thread Tender Wang
jian he  于2024年10月29日周二 14:15写道:

> On Thu, Oct 24, 2024 at 3:01 PM Tender Wang  wrote:
> >
> > I feel that it's hard only to use one struct(for example, X), which just
> calls equal(X, expr)
> > can check both the expression match and the collation match.
> >
>
> in RelOptInfo->partexprs, maybe we should mention that the partition
> key collation is stored
> in RelOptInfo->part_scheme, not here.
>
> > Maybe we should add another collation match checks in
> match_clause_to_partition_key(), like
> > partition pruning logic does.
> >
> in match_clause_to_partition_key
> we already have
>
> else if (IsA(clause, OpExpr) &&
> list_length(((OpExpr *) clause)->args) == 2)
> {
> /*
>  * Partition key match also requires collation match.  There may be
>  * multiple partkeys with the same expression but different
>  * collations, so failure is NOMATCH.
>  */
> if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
> return PARTCLAUSE_NOMATCH;
> }
> else if (IsA(clause, ScalarArrayOpExpr))
> {
> if (!equal(leftop, partkey) ||
> !PartCollMatchesExprColl(partcoll, saop->inputcollid))
> return PARTCLAUSE_NOMATCH;
> }
> So I think match_clause_to_partition_key handling collation is fine.
>
> I think the problem is match_expr_to_partition_keys
> don't have a collation related check.
>

Sorry, it's a typo. It should be  match_expr_to_partition_keys().


> CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
> LIST(c collate "C");
> CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
> collate case_insensitive);
> CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
> collate "C");
> CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
> LIST(c collate ignore_accents);
>
> Our partition-wise join is based on Equi-join [1].
> In some cases,column and partitionkey collation are different,
> but if these two collations are deterministic, then texteq should work
> as expected.
> So I think, pagg_join3 can do partition-wise join,
> I think pagg_join2 can do partition-wise join also.
>
> we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
> do partition-wise join (join with themself),
> or we can let pagg_join2, pagg_join3 do partition-wise join (join with
> themself).
>
>
> POC attached, will let pagg_join2, pagg_join3 do partition-wise join.
>

Hmm, I'm not sure


>
> [1] https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join
>


-- 
Thanks,
Tender Wang


Re: detoast datum into the given buffer as a optimization.

2024-10-28 Thread Andy Fan

Hi,

I find some independent improvement in this area. In the detoast_attr,
VARATT_IS_EXTERNAL_ONDISK and VARATT_IS_EXTERNAL_INDIRECT are checked
first, and then VARATT_IS_EXTERNAL_EXPANDED is checked. However when
VARATT_IS_EXTERNAL_EXPANDED is true, detoast_external_attr is called
which would check the previous two cases again. The attached patch uses
a more specific code to handle this.

This would not only remove such the double check overhead, but also make
my following patch easier since I don't need to build a
detoast_external_attr_[buffer] function.  

'make check-world' passed.

-- 
Best Regards
Andy Fan

>From 7208142241ac18e44be2ee87e9c83c451032ca95 Mon Sep 17 00:00:00 2001
From: Andy Fan 
Date: Tue, 29 Oct 2024 14:05:05 +0800
Subject: [PATCH v20241029 1/1] Using more specific code when detoasting an
 expanded datum.

In the detoast_attr function, VARATT_IS_EXTERNAL_ONDISK and
VARATT_IS_EXTERNAL_INDIRECT are checked first, and then
VARATT_IS_EXTERNAL_EXPANDED is checked, However it is true,
detoast_external_attr is called which would check the two cases
again. The attached patch uses a more specific code to handle this.
---
 src/backend/access/common/detoast.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index 3547cdba56..5f191a83b5 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -161,9 +161,15 @@ detoast_attr(struct varlena *attr)
 		/*
 		 * This is an expanded-object pointer --- get flat format
 		 */
-		attr = detoast_external_attr(attr);
-		/* flatteners are not allowed to produce compressed/short output */
-		Assert(!VARATT_IS_EXTENDED(attr));
+		ExpandedObjectHeader *eoh;
+		Size		resultsize;
+		struct varlena *result;
+
+		eoh = DatumGetEOHP(PointerGetDatum(attr));
+		resultsize = EOH_get_flat_size(eoh);
+		result = (struct varlena *) palloc(resultsize);
+		EOH_flatten_into(eoh, (void *) result, resultsize);
+		attr = result;
 	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
-- 
2.45.1



Re: -Wformat-signedness

2024-10-28 Thread Michael Paquier
On Tue, Oct 29, 2024 at 07:38:36AM +0100, Peter Eisentraut wrote:
> I think it could be useful to set up some better test coverage for various
> things overflowing signed integer maximums.  For example, maybe you could
> hack initdb to advance the OID counter to INT32_MAX+1 or thereabouts and run
> the test suites from there.  That would also catch things like inappropriate
> uses of atoi(), things beyond just the format strings.

Fun.  One way to be fancy here would be to force a pg_resetwal
--next-oid in some of the test paths (Cluster.pm and/or pg_regress)
with an environment variable to force the command to trigger across
the board for all the clusters created in the tests.  initdb cannot be
used here as the TAP tests reuse a cluster already initdb'd to save
time. No need to touch at pg_regress, either, as we could count on the
pg_regress runs in 002_pg_upgrade.pl and 027_stream_regress.pl.
--
Michael


signature.asc
Description: PGP signature


Re: Reduce one comparison in binaryheap's sift down

2024-10-28 Thread cca5507
Agree, new version patch is attached.


--
Regards,
ChangAo Chen



-- Original --
From:   
 "Nathan Bossart"   
 


v2-0001-Reduce-one-comparison-in-binaryheap-s-sift-down.patch
Description: Binary data


Re: Forbid to DROP temp tables of other sessions

2024-10-28 Thread Daniil Davydov
Hi,
Thanks for your comments, I appreciate them.

As I continued to deal with the topic of working with temp tables of
other sessions, I noticed something like a bug. For example
(REL_17_STABLE):
Session 1:
=# CREATE TEMP TABLE test(id int);

Session 2:
=# INSERT INTO pg_temp_0.test VALUES (1);
=# INSERT INTO pg_temp_0.test VALUES (2);

Second INSERT command will end with an error "cannot access temporary
tables of other sessions". I checked why this is happening and found
errors in several places.
So, I attach two files to this email :
1) Isolation test, that shows an error in REL_17_STABLE (iso_1.patch)
2) Patch that fixes code that mistakenly considered temporary tables
to be permanent (I will be glad to receive feedback on these fixes) +
isolation test, which shows that now any action with temp table of
other session leads to error (temp_tbl_fix.patch)

Tests look kinda ugly, but I think it's inevitable, given that we
don't know exactly what the name of the temporary schema of other
session will be.

--
Best regards,
Daniil Davydov
From 995002793b0af57b660565e6027615c083d8ce5e Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Mon, 28 Oct 2024 17:19:07 +0700
Subject: [PATCH] Add isolaton test for work with other session's temp table

---
 src/test/isolation/expected/temp-tables.out |  75 +++
 src/test/isolation/isolation_schedule   |   1 +
 src/test/isolation/specs/temp-tables.spec   | 100 
 3 files changed, 176 insertions(+)
 create mode 100644 src/test/isolation/expected/temp-tables.out
 create mode 100644 src/test/isolation/specs/temp-tables.spec

diff --git a/src/test/isolation/expected/temp-tables.out b/src/test/isolation/expected/temp-tables.out
new file mode 100644
index 00..dbd6ffc2e5
--- /dev/null
+++ b/src/test/isolation/expected/temp-tables.out
@@ -0,0 +1,75 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_st1 s1_st2 s2_st1 s1_st3 s2_st2
+step s1_st1: CREATE TEMP TABLE test_tmp(id int);
+step s1_st2: INSERT INTO temp_schema_id SELECT pg_my_temp_schema();
+s2: NOTICE:  cannot access temporary tables of other sessions
+s2: NOTICE:  cannot access temporary tables of other sessions
+step s2_st1: 
+DO $$
+DECLARE
+schema_name text;
+BEGIN
+-- Find out name of temporary schema of first session
+SELECT nspname INTO schema_name
+  FROM pg_namespace
+ WHERE oid = (SELECT oid FROM temp_schema_id LIMIT 1);
+
+-- Execute few insert operations. We expect that behavior to be the
+-- same (both will complete successfully or both will fail). Since
+-- we have RELATION_IS_OTHER_TEMP() check in PrefetchBuffer and
+-- ReadBufferExtended functions (src/backend/storage/buffer/bufmgr.c),
+-- let's assume that both operations must fail (this is reflected
+-- in expected file)
+BEGIN
+EXECUTE format('INSERT INTO %I.test_tmp VALUES (1);', schema_name);
+EXCEPTION
+WHEN feature_not_supported
+THEN RAISE NOTICE 'cannot access temporary tables of other sessions';
+END;
+
+BEGIN
+EXECUTE format('INSERT INTO %I.test_tmp VALUES (2);', schema_name);
+EXCEPTION
+WHEN feature_not_supported
+THEN RAISE NOTICE 'cannot access temporary tables of other sessions';
+END;
+END
+$$;
+
+step s1_st3: INSERT INTO test_tmp VALUES (3);
+s2: NOTICE:  (3)
+s2: NOTICE:  cannot access temporary tables of other sessions
+s2: NOTICE:  cannot access temporary tables of other sessions
+step s2_st2: 
+DO $$
+DECLARE
+schema_name text;
+result RECORD;
+BEGIN
+-- Find out name of temporary schema of first session
+SELECT nspname INTO schema_name
+  FROM pg_namespace
+ WHERE oid = (SELECT oid FROM temp_schema_id LIMIT 1);
+
+-- Before this step call, first session inserted few tuples into
+-- test_tmp table. Let's assume that SELECT result must contain all
+-- of these tuples (based on current logic)
+FOR result IN
+   EXECUTE format('SELECT * FROM %I.test_tmp;', schema_name)
+  LOOP
+   RAISE NOTICE '%', result;
+   END LOOP;
+
+-- Now lets try to update or delete tuples from test_tmp. If we
+-- cannot insert into this table, lets assume that both UPDATE and
+-- DELETE operations must return same error as INSERT
+BEGIN
+EXECUTE format('UPDATE %I.test_tmp SET id = 100 WHERE id = 3;', schema_name);
+EXCEPTION
+WHEN feature_not_supported
+THEN RAISE NOTICE 'cannot access temporary tables of other sessions';
+END;
+ 

RE: [PoC] Partition path cache

2024-10-28 Thread Bykov Ivan
Hello

> This sounds like an interesting idea, I like it because it omit the needs for 
> "global statistics" effort for partitioned table since it just use the first 
> partition it knows. Of couse it has its drawback that "first"
> partition can't represent other partitions.

This method uses global statistics for all partitions. 
The cache uses standard path building functions (it calculates selectivity for 
path), but it avoids calling all of them for the second and later partitions in 
a group.

The concept is similar to the GEQO method used for joins.
We skip creating some path variants if building all paths would take too long.

> One of the Arguments of this patch might be "What if other partitions have a 
> pretty different statistics from the first partition?". If I were you, I 
> might check all the used statistics on this stage and try to find out a 
> similar algorithms to > prove that the best path would be similar too. This 
> can happens once when the statistics is gathered. However this might be not 
> easy.

Yes, maybe we can split partitions by groups not only by available index lists 
but also by some statistical property ranges.

--
Best Regards
Ivan Bykov



Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Amit Kapila
On Tue, Oct 29, 2024 at 11:30 AM Peter Smith  wrote:
> ==
> src/backend/replication/logical/tablesync.c
>
> fetch_remote_table_info:
>
> 2.
> +fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation 
> *lrel,
> + List **qual, bool *remotegencolpresent)
>
> The name 'remotegencolpresent' sounds like it means a generated col is
> present in the remote table, but don't we only care when it is being
> published? So, would a better parameter name be more like
> 'remote_gencol_published'?
>

I feel no need to add a 'remote' to this variable name as the function
name itself clarifies the same. Both in the function definition and at
the caller site, we can name it 'gencol_published'.

> ~~~
>
> 3.
> Would it be better to introduce a new human-readable variable like:
> bool check_for_published_gencols = (server_version >= 18);
>
> because then you could use that instead of having the 18 check in
> multiple places.
>

It is better to add a comment because it makes this part of the code
difficult to enhance in the same version (18) if required.

> ~~~
>
> 4.
> -   lengthof(attrRow), attrRow);
> +   server_version >= 18 ? lengthof(attrRow) : lengthof(attrRow) -
> 1, attrRow);
>
> If you wish, that length calculation could be written more concisely like:
> lengthof(attrow) - (server_version >= 18 ? 0 : 1)
>

The current way of the patch seems easier to follow.

-- 
With Regards,
Amit Kapila.




Re: protocol-level wait-for-LSN

2024-10-28 Thread Peter Eisentraut

On 29.10.24 06:06, Tatsuo Ishii wrote:

The patch adds a protocol extension called _pq_.wait_for_lsn as well
as a libpq connection option wait_for_lsn to activate the same.  (Use
e.g., psql -d 'wait_for_lsn=1'.)

With this protocol extension, two things are changed:

- The ReadyForQuery message sends back the current LSN.


If other protocol extension X tries to add something to the
ReadyForQuery message too, what would happen?


I think one would have to define that somehow.  If it's useful, the 
additional fields of both extensions could be appended, in some defined 
order.  But this is an interesting question to think about.






Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Amit Kapila
On Mon, Oct 28, 2024 at 12:27 PM Peter Smith  wrote:
>
> On Mon, Oct 28, 2024 at 4:34 PM Amit Kapila  wrote:
> >
> > On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
> > >
> > > Hi, here are my review comments for patch v43-0001.
> > >
> > > ==
> > > src/backend/replication/logical/proto.c
> > >
> > > 2.
> > > +static bool
> > > +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> > > +{
> > > + if (att->attisdropped)
> > > + return false;
> > > +
> > > + /*
> > > + * Skip publishing generated columns if they are not included in the
> > > + * column list.
> > > + */
> > > + if (att->attgenerated && !columns)
> > > + return false;
> > > +
> > > + if (!column_in_column_list(att->attnum, columns))
> > > + return false;
> > > +
> > > + return true;
> > > +}
> > >
> > > Here, I wanted to suggest that the whole "Skip publishing generated
> > > columns" if-part is unnecessary because the next check
> > > (!column_in_column_list) is going to return false for the same
> > > scenario anyhow.
> > >
> > > But, unfortunately, the "column_in_column_list" function has some
> > > special NULL handling logic in it; this means none of this code is
> > > quite what it seems to be (e.g. the function name
> > > column_in_column_list is somewhat misleading)
> > >
> > > IMO it would be better to change the column_in_column_list signature
> > > -- add another boolean param to say if a NULL column list is allowed
> > > or not. That will remove any subtle behaviour and then you can remove
> > > the "if (att->attgenerated && !columns)" part.
> > >
> >
> > The NULL column list still means all columns, so changing the behavior
> > as you are proposing doesn't make sense and would make the code
> > difficult to understand.
> >
>
> My point was that the function 'column_in_column_list' would return
> true even when there is no publication column list at all, so that
> function name is misleading.
>
> And, because in patch 0001 the generated columns only work when
> specified via a column list it means now there is a difference
> between:
> - NULL (all columns specified in the column list) and
> - NULL (no column list at all).
>
> which seems strange and likely to cause confusion.
>

This is no more strange than it was before the 0001 patch. Also, the
comment atop the function clarifies the special condition of the
function. OTOH, I am fine with pulling the check outside function as
you are proposing especially because now it is called from just one
place.

-- 
With Regards,
Amit Kapila.




Re: Add ExprState hashing for GROUP BY and hashed SubPlans

2024-10-28 Thread Andrei Lepikhov

On 9/1/24 18:49, David Rowley wrote:

adf97c156 added support to allow ExprStates to support hashing and
adjusted Hash Join to make use of that. That allowed a speedup in hash
value generation as it allowed JIT compilation of hash values. It also
allowed more efficient tuple deforming as all required attributes are
deformed in one go rather than on demand when hashing each join key.

The attached does the same for GROUP BY and hashed SubPlans. The win
for the tuple deformation does not exist here, but there does seem to
be some gains still to be had from JIT compilation.

Using a scale=1 TPC-H lineitem table, I ran the attached script.

The increase is far from impressive, but likely worth migrating these
over to use ExprState too.
Having remembered that SQL Server uses lightweight threads to execute 
massive hash and aggregate operations in parallel, I think this patch is 
promising. Unfortunately, it causes SEGFAULT on 'make check'.


--
regards, Andrei Lepikhov





Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-28 Thread Tender Wang
Alvaro Herrera  于2024年10月25日周五 23:14写道:

> On 2024-Oct-25, Alexander Lakhin wrote:
>
> > I've also discovered another anomaly with a similar setup, but it's not
> > related to 53af9491a.
>
> Hmm, it may well be a preexisting problem, but I do think it involves
> the same code.  As far as I can tell, the value "2" here
>
> > This script ends up with:
> > ERROR:  invalid attribute number 2
> > ERROR:  cache lookup failed for attribute 2 of relation 16398
>
> is coming from riinfo->confdelsetcols which was set up by
> DetachPartitionFinalize during the last DETACH operation.
>

Hmm, actually, the confdelsetcols before detach and after detach is always
{2}, as below:

postgres=# select oid, conname, conrelid,conparentid,confdelsetcols from
pg_constraint where conrelid = 16397;
  oid  |  conname  | conrelid | conparentid | confdelsetcols
---+---+--+-+
 16400 | pt_a_fkey |16397 |   16392 | {2}
(1 row)

postgres=# ALTER TABLE pt DETACH PARTITION tp1;
ALTER TABLE
postgres=# select oid, conname, conrelid,conparentid,confdelsetcols from
pg_constraint where conrelid = 16397;
  oid  |  conname  | conrelid | conparentid | confdelsetcols
---+---+--+-+
 16400 | pt_a_fkey |16397 |   0 | {2}
(1 row)

Even though no detach, the confdelsetcols is {2} . But no error report.
Because the rel->rd_att->natts of pt is 2.
It will not go into tp1 because tp1 is a partition of pt. But after
detach,  the rel->rd_att->natts of tp1 is 1,
so "ERROR:  invalid attribute number 2" will report.

CREATE TABLE tp1...  will ignore the dropped column of parent, so the natts
of tp1 is 1, but its parent is 2.

-- 
Thanks,
Tender Wang


Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-10-28 Thread Jingtang Zhang
Hi! Glad to see update in this thread.

Little question about v24 0002 patch: would it be better to move the
implementation of TableModifyIsMultiInsertsSupported to somewhere for table
AM
level? Seems it is a common function for future use, not a specific one for
matview.

---

Regards, Jingtang


Bharath Rupireddy  于2024年10月26日周六
21:31写道:

> Hi,
>
> Thanks for looking into this.
>
> On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis  wrote:
> >
> > I believe we need the branching in the caller anyway:
> >
> > 1. If there is a BEFORE row trigger with a volatile function, the
> > visibility rules[1] mean that the function should see changes from all
> > the rows inserted so far this command, which won't work if they are
> > still in the buffer.
> >
> > 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> > that the function should see all previous rows inserted.
> >
> > 3. If there are volatile functions in the target list or WHERE clause,
> > the same visibility semantics apply.
> >
> > 4. If there's a "RETURNING ctid" clause, we need to either come up with
> > a way to return the tuples after flushing, or we need to use the
> > single-tuple path. (Similarly in the future when we support UPDATE ...
> > RETURNING, as Matthias pointed out.)
> >
> > If we need two paths in each caller anyway, it seems cleaner to just
> > wrap the check for tuple_modify_buffer_insert in
> > table_modify_buffer_enabled().
> >
> > We could perhaps use a one path and then force a batch size of one or
> > something, which is an alternative, but we have to be careful not to
> > introduce a regression (and it still requires a solution for #4).
>
> I chose to branch in the caller e.g. if there's a volatile function
> SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
> table_tuple_insert() path, else multi-insert path.
>
> I am posting the new v24 patch set organized as follows: 0001
> introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
> using the new table AM for COPY ... FROM. I, for now, discarded the
> INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
> is to take the basic stuff forward.
>
> I reworked structure names, members and function names, reworded
> comments, addressed review comments in the v24 patches. Please have a
> look.
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: New "raw" COPY format

2024-10-28 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote:
> On Mon, Oct 28, 2024, at 08:56, jian he wrote:
>>   /* Check force_quote */
>> - if (!opts_out->csv_mode && (opts_out->force_quote ||
>> opts_out->force_quote_all))
>> + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
>> + opts_out->force_quote_all))
>>   ereport(ERROR,
>>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>   /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>>
>> maybe this has a code indentation issue.
>> since "if" and "opts_out" in the same column position.
>
> Thanks for review.
>
> I've fixed the indentation issues.

I've now installed pgindent, and will use it from hereon, to avoid this class 
of problems.

New version where all three patches are now indented using pgindent.

/Joel

v15-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
Description: Binary data


v15-0002-Add-raw-format-to-COPY-command.patch
Description: Binary data


v15-0003-Reorganize-option-validations.patch
Description: Binary data


Re: msvc directory missing in PostgreSQL 17.0

2024-10-28 Thread 黄铎彦
On 2024-10-28 09:42 GMT+8 Umar Hayat  wrote:
>[0] 
>https://www.postgresql.org/message-id/flat/CADK3HHLQ1MNmfXqEvQi36D_MQrheOZPcXv2H3s6otMbSmfwjzg%40mail.gmail.com


It worked! Thank you Umar! My poor searching ability QwQ.
But may be a small bug about the order of generation. After I first executed 
"build -> build solution", I got an error message "libpq.pdb not found". Then I 
executed for a second time and it returned complete success.

Re: Avoid orphaned objects dependencies, take 3

2024-10-28 Thread Bertrand Drouvot
Hi,

On Mon, Aug 19, 2024 at 03:35:14PM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Wed, Jul 10, 2024 at 07:31:06AM +, Bertrand Drouvot wrote:
> > So, to sum up:
> > 
> > A. Locking is now done exclusively with LockNotPinnedObject(Oid classid, 
> > Oid objid)
> > so that it's now always clear what object we want to acquire a lock for. It 
> > means
> > we are not manipulating directly an object address or a list of objects 
> > address
> > as it was the case when the locking was done "directly" within the 
> > dependency code.
> > 
> > B. A special case is done for objects that belong to the RelationRelationId 
> > class.
> > For those, we should be in one of the two following cases that would already
> > prevent the relation to be dropped:
> > 
> >  1. The relation is already locked (could be an existing relation or a 
> > relation
> >  that we are creating).
> > 
> >  2. The relation is protected indirectly (i.e an index protected by a lock 
> > on
> >  its table, a table protected by a lock on a function that depends the 
> > table...)
> > 
> > To avoid any risks for the RelationRelationId class case, we acquire a lock 
> > if
> > there is none. That may add unnecessary lock for 2. but that seems worth 
> > it. 
> > 
> 
> Please find attached v16, mandatory rebase due to 80ffcb8427.

rebased (v17 attached).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From e08d57be0ae5cfe6713cfa4b241320fffd80f822 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v17] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place before the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch takes into account any type of objects except the ones that are pinned
(they are not droppable because the system requires it).

A special case is done for objects that belong to the RelationRelationId class.
For those, we should be in one of the two following cases that would already
prevent the relation to be dropped:

1. The relation is already locked (could be an existing relation or a relation
that we are creating).

2. The relation is protected indirectly (i.e an index protected by a lock on
its table, a table protected by a lock on a function that depends the table...)

To avoid any risks for the RelationRelationId class case, we acquire a lock if
there is none. That may add unnecessary lock for 2. but that's worth it.

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/aclchk.c  |   1 +
 src/backend/catalog/dependency.c  | 212 ++
 src/backend/catalog/heap.c|   7 +
 src/backend/catalog/index.c   |  26 +++
 src/backend/catalog/objectaddress.c   |  57 +
 src/backend/catalog/pg_aggregate.c|   9 +
 src/backend/catalog/pg_attrdef.c  |   1 +
 src/backend/catalog/pg_cast.c |   5 +
 src/backend/catalog/pg_collation.c|   1 +
 src/backend/catalog/pg_constraint.c   |  26 +++
 src/backend/catalog/pg_conversion.c   |   2 +
 src/backend/catalog/pg_depend.c   |  39 +++-
 src/backend/catalog/pg_operator.c |  19 ++
 src/backend/catalog/pg_proc.c |  17 +-
 src/backend/catalog/pg_publication.c  |  11 +
 src/backend/catalog/pg_range.c|   6 +
 src/backend/catalog/pg_type.c |  39 
 src/backend/catalog/toasting.c|   1 +
 src/backend/commands/alter.c  |   4 +
 src/backend/commands/amcmds.c |   1 +
 src/

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-28 Thread Bertrand Drouvot
Hi,

On Fri, Oct 25, 2024 at 09:33:12AM +0900, Michael Paquier wrote:
> This is in the first permutation of the test done with "wait1 wakeup2
> detach2", and the diff means that the backend running the "wait"
> callback is reported as finished after the detach is done,
> injection_points_run being only used for the waits.  Hence the wait is
> so slow to finish that the detach has time to complete and finish,
> breaking the output.

Yeah, I agree with your analysis.

> And here comes the puzzling part: all of failures involve FreeBSD 13
> in the CI.  Reproducing this failure would not be difficult, I thought
> first; we can add a hardcoded pg_usleep() to delay the end of
> injection_wait() so as we make sure that the backend doing the wait
> reports later than the detach.  Or just do the same at the end of
> injection_points_run() once the callback exits.  I've sure done that,
> placing some strategic pg_usleep() calls on Linux to make the paths
> that matter in the wait slower, but the test remains stable.

Right, I did the same and observed the exact same behavior.

Still, it's possible to observe the s1 wait finishing after the s2 detach by
running this test manually (means create 2 sessions and run the test commands
manually) and: 

1. attach a debuger on the first session (say with a break point in 
injection_wait()).
or
2. add a hardcoded pg_usleep() in injection_wait()

So I think that the s1 wait finishing after the s2 detach is possible if
the session 1 is "freezed" (gdb case) or slow enough (pg_usleep() case).

> The CI
> on Linux is stable as well: 3rd and 4th columns of the cfbot are
> green, I did not spot any failures related to this isolation test in
> injection_points.  Only the second column about FreeBSD is going rogue
> on a periodic basis.
> 
> One fix would is to remove this first permutation test, still that's
> hiding a problem rather than solving it, and something looks wrong
> with conditional variables, specific to FreeBSD?

Hum, we would probably observe other failures in other tests no?

> Any thoughts or comments?

I think that we can not be 100% sure that the s1 wait will finish before the
s2 detach (easy reproducible with gdb attached on s1 or an hardcoded sleep) and
that other OS could also report the test as failing for the same reason.

It's not ideal, but instead of removing this first permutation test what about
adding a "sleep2" step in it (doing say, SELECT pg_sleep(1);) and call this
new step before the detach2 one?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: POC, WIP: OR-clause support for indexes

2024-10-28 Thread jian he
 * NOTE:  returns NULL if clause is an OR or AND clause; it is the
 * responsibility of higher-level routines to cope with those.
 */
static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
 RestrictInfo *rinfo,
 int indexcol,
 IndexOptInfo *index)

the above comments need a slight change.


EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand = 1
OR thousand = 3);
QUERY PLAN
---
 Bitmap Heap Scan on tenk2
   Recheck Cond: ((thousand = 1) OR (thousand = 3))
   ->  Bitmap Index Scan on tenk2_thous_tenthous
 Index Cond: (thousand = ANY ('{1,3}'::integer[]))

EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand in (1,3));
QUERY PLAN
---
 Bitmap Heap Scan on tenk2
   Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
   ->  Bitmap Index Scan on tenk2_thous_tenthous
 Index Cond: (thousand = ANY ('{1,3}'::integer[]))

tenk2 index:
Indexes:
"tenk2_thous_tenthous" btree (thousand, tenthous)

Looking at the above cases, I found out the "Recheck Cond" is
different from "Index Cond".
I wonder why there is a difference, or if they should be the same.
then i come to:
match_orclause_to_indexcol

/*
 * Finally, build an IndexClause based on the SAOP node. Use
 * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
 * estimations because it may differ from the estimation made for an OR
 * clause. Although it is not a lossy expression, keep the old version of
 * rinfo in iclause->rinfo to detect duplicates and recheck the original
 * clause.
 */
iclause = makeNode(IndexClause);
iclause->rinfo = rinfo;
iclause->indexquals = list_make1(make_simple_restrictinfo(root,
  &saopexpr->xpr));
iclause->lossy = false;
iclause->indexcol = indexcol;
iclause->indexcols = NIL;

looking at create_bitmap_scan_plan.
I think "iclause->rinfo" itself won't be able to detect duplicates.
since the upper code would mostly use "iclause->indexquals" for comparison?


typedef struct IndexClause comments says:
"
 * indexquals is a list of RestrictInfos for the directly-usable index
 * conditions associated with this IndexClause.  In the simplest case
 * it's a one-element list whose member is iclause->rinfo.  Otherwise,
 * it contains one or more directly-usable indexqual conditions extracted
 * from the given clause.  The 'lossy' flag indicates whether the
 * indexquals are semantically equivalent to the original clause, or
 * represent a weaker condition.
"
should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
since it meets the comment condition: "semantically equivalent to the
original clause"
or is the above comment slightly wrong?

in match_orclause_to_indexcol
i changed from
iclause->rinfo = rinfo;
to
 iclause->rinfo = make_simple_restrictinfo(root,
&saopexpr->xpr);

as expected. now the "Recheck Cond" is same as "Index Cond"
   Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
   ->  Bitmap Index Scan on tenk2_thous_tenthous
 Index Cond: (thousand = ANY ('{1,3}'::integer[]))

I am not sure of the implication of this change.




Re: further #include cleanup (IWYU)

2024-10-28 Thread Peter Eisentraut

On 20.10.24 11:37, Alvaro Herrera wrote:

On 2024-Oct-20, Peter Eisentraut wrote:


diff --git a/contrib/tablefunc/tablefunc.h b/contrib/tablefunc/tablefunc.h
index 2009382ce7d..b78030044b5 100644
--- a/contrib/tablefunc/tablefunc.h
+++ b/contrib/tablefunc/tablefunc.h
@@ -34,6 +34,4 @@
  #ifndef TABLEFUNC_H
  #define TABLEFUNC_H
  
-#include "fmgr.h"

-
  #endif/* TABLEFUNC_H 
*/


You could as well just delete this file.


I have committed it with the file deleted.





Re: further #include cleanup (IWYU)

2024-10-28 Thread Peter Eisentraut

On 20.10.24 11:53, Alvaro Herrera wrote:

On 2024-Oct-20, Peter Eisentraut wrote:

diff --git a/src/bin/pg_dump/pg_backup_utils.c 
b/src/bin/pg_dump/pg_backup_utils.c
index a0045cf5e58..80715979a1a 100644
--- a/src/bin/pg_dump/pg_backup_utils.c
+++ b/src/bin/pg_dump/pg_backup_utils.c
@@ -13,7 +13,9 @@
   */
  #include "postgres_fe.h"
  
+#ifdef WIN32

  #include "parallel.h"
+#endif
  #include "pg_backup_utils.h"


This seems quite weird and I think it's just because exit_nicely() wants
to do _endthreadex().  Maybe it'd be nicer to add a WIN32-specific
on_exit_nicely_list() callback that does that in parallel.c, and do away
with the inclusion of parallel.h in pg_backup_utils.c entirely?


I was thinking the same thing.  But maybe that should be a separate project.


diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index a09247fae47..78e91f6e2dc 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -63,7 +63,9 @@
  #include "fe_utils/string_utils.h"
  #include "parallel.h"
  #include "pg_backup_utils.h"
+#ifdef WIN32
  #include "port/pg_bswap.h"
+#endif


This looks really strange, but then parallel.c seems to have embedded
its own portability layer within itself.


The reason for this one is that pgpipe() uses pg_hton16() and 
pg_hton32().  We could use htons() and htonl() here instead.  That would 
effectively revert that part of commit 0ba99c84e8c.





Re: Consider the number of columns in the sort cost model

2024-10-28 Thread Alena Rybakina

Hi! Thank you for your work on this subject!

On 23.10.2024 04:39, Andrei Lepikhov wrote:

On 15/10/2024 12:15, David Rowley wrote:

On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov  wrote:
I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key.  That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values.  It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out.  It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

It seems that your idea with an equivalence class works.
In the patchset attached, I show all the ideas (I initially wanted to 
discuss it step-by-step).
In the attached, the first patch is about choosing adequate expression 
from the list of EC members.
The second one is the current patch, which considers the number of 
sort columns.
Third patch - elaboration of the second patch. Use only the trustful 
statistics here - the number of ndistinct values on the first sorted 
column.
And the last patch is a demo of how I'm going to use the previous 
three patches and add one more strategy to improve the order of 
columns in the GROUP-BY clause.



I have some question about your patches.

1. You consider two lists root->group_pathkeys and 
root->processed_groupClause.


As far as I understand, this is due to the fact that the pathkey 
elements are identical to the group clause and identity verification is 
precisely checked by this condition:


if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)

To be honest, I didn’t find information about this in the code, but did 
I understand everything correctly?


2. I noticed that statistics of distinct values ​​are calculated several 
times. Maybe I miss something, but this can be avoided if these 
statistics can be saved after calculation. For example, I saw that it is 
possible that you calculate the same statistic information for the same 
equivalence members in cost_incremental_sort and identify_sort_ecmember. 
Is it possible to store information in a structure and use it later?


3. I think you should initialize the variable ndist in your patch. I 
faced the compile complaining during your code compilation.


costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized 
[-Werror=maybe-uninitialized]

 6694 | em->em_ndistinct = ndist;
  | ~^
costsize.c:6680:33: note: ‘ndist’ was declared here
 6680 | double  ndist;
  | ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [: costsize.o] Error 1

4. Before executing examine_variable function I think you should check 
that it is not null.


@@ -575,6 +575,8 @@ get_useful_group_keys_orderings(PlannerInfo *root, 
Path *path)

              */
             continue;

+        Assert (node != NULL);
+
         examine_variable(root, node, 0, &vardata);
         if (!HeapTupleIsValid(vardata.statsTuple))
             continue;


--
Regards,
Alena Rybakina
Postgres Professional





Re: Can rs_cindex be < 0 for bitmap heap scans?

2024-10-28 Thread Dilip Kumar
On Sat, Oct 26, 2024 at 5:30 PM Melanie Plageman 
wrote:

> On Sat, Oct 26, 2024 at 12:17 AM Dilip Kumar 
> wrote:
> >
> > On Fri, Oct 25, 2024 at 10:07 PM Melanie Plageman <
> melanieplage...@gmail.com> wrote:
> >>
> >> On Fri, Oct 25, 2024 at 10:29 AM Melanie Plageman
> >>  wrote:
> >> >
> >> > Tom suggested off-list that if rs_cindex can't be zero, then it should
> >> > be unsigned. I checked the other scan types using the
> >> > HeapScanDescData, and it seems none of them use values of rs_cindex or
> >> > rs_ntuples < 0. As such, here is a patch making both rs_ntuples  and
> >> > rs_cindex unsigned.
> >>
> >
> > @@ -943,8 +945,8 @@ heapgettup_pagemode(HeapScanDesc scan,
> >  {
> >   HeapTuple tuple = &(scan->rs_ctup);
> >   Page page;
> > - int lineindex;
> > - int linesleft;
> > + uint32 lineindex;
> > + uint32 linesleft;
> >
> > IMHO we can't make  "lineindex" as uint32, because just see the first
> code block[1] of heapgettup_pagemode(), we use this index as +ve (Forward
> scan )as well as -ve (Backward scan).
>
> Yes, so in the case of backwards scan, if scan->rs_cindex is 0, when
> dir is -1, lineindex will wrap around, but we don't use it in that
> case because linesleft will be 0 and then we will not meet the
> conditions to execute the code in the loop under continue_page. And in
> the loop, when adding -1 to lineindex, for backwards scan, it always
> starts at linesleft and ticks down to 0.
>

Yeah you are right, although the lineindex will wrap around when rs_cindex
is 0 , it would not be used.  So, it won’t actually cause any issues, but
I’m not comfortable with the unintentional wraparound. I would have left
"scan->rs_cindex" as int itself but I am fine with whatever you prefer.


> We could add an if statement above the goto that says something like
> if (linesleft > 0)
> goto continue_page;
>
> Would that make it clearer?
>

Not sure it would make it clearer.  In fact, In common cases it would add
an extra instruction to check the if (linesleft > 0).


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Peter Smith
On Mon, Oct 28, 2024 at 5:45 PM Amit Kapila  wrote:
>
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
> >
> > 7.
> > +$node_publisher->wait_for_catchup('sub_gen');
> > +
> > +is( $node_subscriber->safe_psql(
> > + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> > + qq(1|2),
> > + 'replication with generated columns in column list');
> > +
> >
> > But, this is only testing normal replication. You should also include
> > some initial table data so you can test that the initial table
> > synchronization works too. Otherwise, I think current this patch has
> > no proof that the initial 'copy_data' even works at all.
> >
>
> Per my tests, the initial copy doesn't work with 0001 alone. It needs
> changes in table sync.c from the 0002 patch. Now, we can commit 0001
> after fixing comments and mentioning in the commit message that this
> patch supports only the replication of generated columns when
> specified in the column list. The initial sync and replication of
> generated columns when not specified in the column list will be
> supported in future commits. OTOH, if the change to make table sync
> work is simple, we can even combine that change.
>

If this comes to a vote, then my vote is to refactor the necessary
tablesync COPY code back into patch 0001 so that patch 0001 can
replicate initial data properly stand alone.

Otherwise, (if we accept patch 0001 only partly works, like now) users
would have to jump through hoops to get any benefit from this patch by
itself. This is particularly true because the CREATE SUBSCRIPTION
'copy_data' parameter default is true, so patch 0001 is going to be
broken by default if there is any pre-existing table data when
publishing generated columns to default subscriptions.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Inval reliability, especially for inplace updates

2024-10-28 Thread Nitin Motiani
On Thu, Oct 24, 2024 at 8:24 AM Noah Misch  wrote:
>
> With the releases wrapping in 2.5 weeks, I'm ambivalent about pushing this
> before the release or after.  Pushing before means fewer occurrences of
> corruption, but pushing after gives more bake time to discover these changes
> were defective.  It's hard to predict which helps users more, on a
> risk-adjusted basis.  I'm leaning toward pushing this week.  Opinions?
>

I lean towards pushing after the release. This is based on my
assumption that since this bug has been around for a while, it is
(probably) not hit often. And a few weeks delay is better than
introducing a new defect.

Thanks




Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-28 Thread Alexander Lakhin

Hello Tom,

27.10.2024 20:41, Tom Lane wrote:

I wrote:

In the no-good-deed-goes-unpunished department: buildfarm member
hamerkop doesn't like this patch [1].  The diffs look like
...
So what I'd like to do to fix this is to change
-   if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL)
+   if ((file = AllocateFile(filename, "r")) == NULL)

Well, that didn't fix it :-(.  I went so far as to extract the raw log
files from the buildfarm database, and what they show is that there is
absolutely no difference between the lines diff is claiming are
different:

-QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n
+QUERY:  CREATE FUNCTIN my_erroneous_func(int) RETURNS int LANGUAGE SQL\r\n

It's the same both before and after 924e03917, which made the code
change depicted above, so that didn't help.

So I'm pretty baffled.  I suppose the expected and result files must
actually be different, and something in subsequent processing is
losing the difference before it gets to the buildfarm database.
But I don't have the ability to debug that from here.  Does anyone
with access to hamerkop want to poke into this?

Without additional information, the only thing I can think of that
I have any confidence will eliminate these failures is to reformat
the affected test cases so that they produce just a single line of
output.  That's kind of annoying from a functionality-coverage point
of view, but I'm not sure avoiding it is worth moving mountains for.



I've managed to reproduce the issue with the core.autocrlf=true git setting
(which sets CR+LF line ending in test_ext7--2.0--2.1bad.sql) and with diff
from msys:
C:\msys64\usr\bin\diff.exe --version
diff (GNU diffutils) 3.8

(Gnu/Win32 Diff [1] doesn't detect those EOL differences and thus the test
doesn't fail.)

I can really see different line endings with hexdump:
hexdump -C ...testrun\test_extensions\regress\regression.diffs
0230  20 20 20 20 20 20 20 20  5e 0a 2d 51 55 45 52 59  | ^.-QUERY|
0240  3a 20 20 43 52 45 41 54  45 20 46 55 4e 43 54 49  |: CREATE FUNCTI|
0250  4e 20 6d 79 5f 65 72 72  6f 6e 65 6f 75 73 5f 66  |N my_erroneous_f|
0260  75 6e 63 28 69 6e 74 29  20 52 45 54 55 52 4e 53 |unc(int) RETURNS|
0270  20 69 6e 74 20 4c 41 4e  47 55 41 47 45 20 53 51  | int LANGUAGE SQ|
0280  4c 0a 2b 51 55 45 52 59  3a 20 20 43 52 45 41 54 |L.+QUERY:  CREAT|
0290  45 20 46 55 4e 43 54 49  4e 20 6d 79 5f 65 72 72  |E FUNCTIN my_err|
02a0  6f 6e 65 6f 75 73 5f 66  75 6e 63 28 69 6e 74 29 |oneous_func(int)|
02b0  20 52 45 54 55 52 4e 53  20 69 6e 74 20 4c 41 4e  | RETURNS int LAN|
02c0  47 55 41 47 45 20 53 51  4c 0d 0a 20 41 53 20 24  |GUAGE SQL.. AS $|

hexdump -C .../testrun/test_extensions/regress/results/test_extensions.out | 
grep -C5 FUNCTIN
0b80  20 5e 0d 0a 51 55 45 52  59 3a 20 20 43 52 45 41  | ^..QUERY:  CREA|
0b90  54 45 20 46 55 4e 43 54  49 4e 20 6d 79 5f 65 72  |TE FUNCTIN my_er|
0ba0  72 6f 6e 65 6f 75 73 5f  66 75 6e 63 28 69 6e 74 |roneous_func(int|
0bb0  29 20 52 45 54 55 52 4e  53 20 69 6e 74 20 4c 41  |) RETURNS int LA|
0bc0  4e 47 55 41 47 45 20 53  51 4c 0d 0d 0a 41 53 20  |NGUAGE SQL...AS |

whilst
hexdump -C .../src/test/modules/test_extensions/expected/test_extensions.out | 
grep -C5 FUNCTIN
0b80  20 5e 0d 0a 51 55 45 52  59 3a 20 20 43 52 45 41  | ^..QUERY:  CREA|
0b90  54 45 20 46 55 4e 43 54  49 4e 20 6d 79 5f 65 72  |TE FUNCTIN my_er|
0ba0  72 6f 6e 65 6f 75 73 5f  66 75 6e 63 28 69 6e 74 |roneous_func(int|
0bb0  29 20 52 45 54 55 52 4e  53 20 69 6e 74 20 4c 41  |) RETURNS int LA|
0bc0  4e 47 55 41 47 45 20 53  51 4c 0d 0a 41 53 20 24  |NGUAGE SQL..AS $|

It looks like --strip-trailing-cr doesn't work as desired for this diff version.

I've also dumped buf in read_whole_file() and found that in both
PG_BINARY_R and "r" modes the 0d 0a ending is preserved. But it changed
to 0a with the "rt" mode (see [1]), and it makes the test (and the whole
`meson test`) pass for me.

[1] https://gnuwin32.sourceforge.net/packages/diffutils.htm
[2] 
https://learn.microsoft.com/en-us/cpp/c-runtime-library/file-translation-constants?view=msvc-170

Best regards,
Alexander




Re: sslinfo extension - add notbefore and notafter timestamps

2024-10-28 Thread Daniel Gustafsson
The recent bump in minmum required versions of OpenSSL and LibreSSL made me
remember to revisit this patch which was previously reverted due to library
incompatibility (with *both* OpenSSL and LibreSSL on different APIs).

The attached removes the timestamp conversion workaround which is no longer
needed.

--
Daniel Gustafsson



v13-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-disp.patch
Description: Binary data


Re: New "raw" COPY format

2024-10-28 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 08:56, jian he wrote:
>   /* Check force_quote */
> - if (!opts_out->csv_mode && (opts_out->force_quote ||
> opts_out->force_quote_all))
> + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
> + opts_out->force_quote_all))
>   ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>   /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>
> maybe this has a code indentation issue.
> since "if" and "opts_out" in the same column position.

Thanks for review.

I've fixed the indentation issues.

> It came to my mind,
> change errmsg occurrence of  "BINARY mode", "CSV mode" to "binary
> format",  "csv format" respectively.
> I think "format" would be more accurate.
> but the message seems invasive,
> so i guess we need to use "mode".

That would work, I'm fine with either.

> overall v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
> looks good to me.

Cool.

/Joel

v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
Description: Binary data


v14-0002-Add-raw-format-to-COPY-command.patch
Description: Binary data


v14-0003-Reorganize-option-validations.patch
Description: Binary data


Re: Reordering DISTINCT keys to match input path's pathkeys

2024-10-28 Thread Andrei Lepikhov

On 6/7/24 16:46, Richard Guo wrote:

On Mon, Feb 5, 2024 at 11:18 AM Richard Guo  wrote:

cfbot reminds that this patch does not apply any more.  So I've rebased
it on master, and also adjusted the test cases a bit.


This patch does not apply any more, so here is a new rebase, with some
tweaks to the comments.

This patch needs a minor rebase again.
After skimming the code, I want to say that it looks good. But maybe to 
avoid one more *_reordering GUC - it would be better to cover all path 
key reorderings under a single GUC.


--
regards, Andrei Lepikhov





Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-28 Thread Alvaro Herrera
On 2024-Oct-27, Tender Wang wrote:

> BTW, while reviewing the v2 patch, I found the parentConTup in
> foreach(cell, fks) block
> didn't need it anymore. We can remove the related codes.

True.  Done so in this v3.

I noticed another problem here: we're grabbing the wrong lock type on
the referenced rel (AccessShareLock) during detach.  (What's more: we
release it afterwards, which is the wrong thing to do.  We need to keep
such locks until end of transaction).  I didn't try to construct a case
where this would be a problem, but if I change AccessShare to NoLock,
the assertion that says we don't hold _any_ lock on that relation fires,
which means that we're not taking any locks on those rels before this
point.  So this lock strength choice is definitely wrong.  I changed it
to ShareRowExclusive, which is what we're suppose to use when adding a
trigger.  Another option might be to do find_all_inheritors() ahead of
time to grab all the necessary locks, but I didn't try to do that.  I
also added an assertion in addFkRecurseReferenced to verify that we hold
that in all paths, and after this change it doesn't fire anymore with
the regression tests.

I have still not edited the commit message.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 3f18de0a263e507b79b1f1eb9a9a128cd6292779 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Sat, 26 Oct 2024 23:44:58 +0200
Subject: [PATCH v3] No need to use an attrmap when detaching a foreign key

The reason is that the constraint being created is on the same relation
as the constraint that it spawns from.
---
 src/backend/commands/tablecmds.c  | 25 ---
 src/test/regress/expected/foreign_key.out |  9 +---
 src/test/regress/sql/foreign_key.sql  | 10 ++---
 3 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e14bc0c0548..c993abc9792 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10372,6 +10372,8 @@ addFkRecurseReferenced(Constraint *fkconstraint, Relation rel,
 	Oid			deleteTriggerOid,
 updateTriggerOid;
 
+	Assert(CheckRelationLockedByMe(pkrel, ShareRowExclusiveLock, true));
+
 	/*
 	 * Create the action triggers that enforce the constraint.
 	 */
@@ -19436,8 +19438,7 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 	foreach(cell, fks)
 	{
 		ForeignKeyCacheInfo *fk = lfirst(cell);
-		HeapTuple	contup,
-	parentConTup;
+		HeapTuple	contup;
 		Form_pg_constraint conform;
 		Oid			insertTriggerOid,
 	updateTriggerOid;
@@ -19455,13 +19456,6 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 			continue;
 		}
 
-		Assert(OidIsValid(conform->conparentid));
-		parentConTup = SearchSysCache1(CONSTROID,
-	   ObjectIdGetDatum(conform->conparentid));
-		if (!HeapTupleIsValid(parentConTup))
-			elog(ERROR, "cache lookup failed for constraint %u",
- conform->conparentid);
-
 		/*
 		 * The constraint on this table must be marked no longer a child of
 		 * the parent's constraint, as do its check triggers.
@@ -19502,7 +19496,6 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 			Oid			conffeqop[INDEX_MAX_KEYS];
 			int			numfkdelsetcols;
 			AttrNumber	confdelsetcols[INDEX_MAX_KEYS];
-			AttrMap*attmap;
 			Relation	refdRel;
 
 			DeconstructFkConstraintRow(contup,
@@ -19535,20 +19528,19 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 			fkconstraint->old_pktable_oid = InvalidOid;
 			fkconstraint->location = -1;
 
-			attmap = build_attrmap_by_name(RelationGetDescr(partRel),
-		   RelationGetDescr(rel),
-		   false);
+			/* set up colnames, used to generate the constraint name */
 			for (int i = 0; i < numfks; i++)
 			{
 Form_pg_attribute att;
 
 att = TupleDescAttr(RelationGetDescr(partRel),
-	attmap->attnums[conkey[i] - 1] - 1);
+	conkey[i] - 1);
+
 fkconstraint->fk_attrs = lappend(fkconstraint->fk_attrs,
  makeString(NameStr(att->attname)));
 			}
 
-			refdRel = table_open(fk->confrelid, AccessShareLock);
+			refdRel = table_open(fk->confrelid, ShareRowExclusiveLock);
 
 			addFkRecurseReferenced(fkconstraint, partRel,
    refdRel,
@@ -19565,11 +19557,10 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
    true,
    InvalidOid, InvalidOid,
    conform->conperiod);
-			table_close(refdRel, AccessShareLock);
+			table_close(refdRel, NoLock);	/* keep lock till end of xact */
 		}
 
 		ReleaseSysCache(contup);
-		ReleaseSysCache(parentConTup);
 	}
 	list_free_deep(fks);
 	if (trigrel)
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index b73e7dced8f..69994c98e32 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.

Re: IPC::Run::time[r|out] vs our TAP tests

2024-10-28 Thread Heikki Linnakangas

On 09/04/2024 20:10, Daniel Gustafsson wrote:

Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it).  Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.


+1. The patch looks good to me. I didn't comb through the tests to check 
for bugs of omission, i.e. cases where the caller would need adjustments 
because of this, I trust that you found them all.



=item $session->quit

Close the session and clean up resources. Each test run must be closed with
C.  Returns TRUE when the session was cleanly terminated, otherwise
FALSE.  A testfailure will be issued in case the session failed to finish.


What does "session failed to finish" mean? Does it mean the same as "not 
cleanly terminated", i.e. a test failure is issued whenever this returns 
FALSE?


typo: testfailure -> test failure



diff --git a/src/test/recovery/t/031_recovery_conflict.pl 
b/src/test/recovery/t/031_recovery_conflict.pl
index d87efa823fd..62936f52d20 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -253,9 +253,7 @@ $res = $psql_standby->query_until(
 -- wait for lock held by prepared transaction
SELECT * FROM $table2;
 ]);
-ok(1,
-   "$sect: cursor holding conflicting pin, also waiting for lock, 
established"
-);
+isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, 
established");
 
 # just to make sure we're waiting for lock already

 ok( $node_standby->poll_query_until(
diff --git a/src/test/recovery/t/037_invalid_database.pl 
b/src/test/recovery/t/037_invalid_database.pl
index 6d1c7117964..c8c20077f85 100644
--- a/src/test/recovery/t/037_invalid_database.pl
+++ b/src/test/recovery/t/037_invalid_database.pl
@@ -94,6 +94,7 @@ is($node->psql('postgres', 'DROP DATABASE 
regression_invalid'),
 my $cancel = $node->background_psql('postgres', on_error_stop => 1);
 my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
 my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+isnt($pid, undef, 'Get backend PID');
 
 # create the database, prevent drop database via lock held by a 2PC transaction

 ok( $bgpsql->query_safe(


I'm not sure I understand these changes. These can only fail if the 
query() or query_until() function times out, right? In that case, the 
query() or query_until() would also report a test failure, so these 
additional checks after the call seem redundant. They don't do any harm 
either, but I wonder why have them in these particular call sites and 
not in other query() or query_until() calls.



The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair.  Done in 0001 (but
really not necessary).


+1


Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:

$back_q->query_until(
qr/logical_slot_get_changes/, q(
   \echo logical_slot_get_changes
   SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));

...  ...

# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;

There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point.  This might still work as intended, but the comment is
slightly misleading IMO.


Agreed, it would be good to actually check that it returns.


recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating.  Calling ->quit is
mandated by the API, in turn required by IPC::Run.  Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl.  0003 fixes
this.


Alexander included this fix in commit 3c5db1d6b016 already.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Reduce one comparison in binaryheap's sift down

2024-10-28 Thread Robert Haas
On Mon, Oct 28, 2024 at 11:22 AM cca5507  wrote:
> I think we can reduce one comparison in binaryheap's sift down, right?
>
> Here is a patch to fix it.

Hmm, so at present we compare the parent to the left child and to the
right child. If it's smaller than neither, everything is OK. If it's
smaller than one, we swap it with that one. If it's smaller than both,
we compare the left and right child with each other and swap the
parent with the larger of the two. Hence, if a node has 2 children, we
always do 2 comparisons, and we sometimes do 3 comparisons.

With the patch, we first compare the two children to each other, and
then compare the larger one to the parent. If the parent is smaller
than the larger child, we swap them. Hene, if a node has 2 children,
we always do 2 comparisons.

Unless I'm missing something, that does seem significantly better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-10-28 Thread Jacob Champion
On Mon, Oct 28, 2024 at 6:24 AM Daniel Gustafsson  wrote:
> > On 25 Oct 2024, at 20:22, Jacob Champion  
> > wrote:
>
> > I have combed almost all of Daniel's feedback backwards into the main
> > patch (just the new bzero code remains, with the open question
> > upthread),
>
> Re-reading I can't see a vector there, I guess I am just scarred from what
> seemed to be harmless leaks in auth codepaths and treat every bit as
> potentially important.  Feel free to drop from the patchset for now.

Okay. For authn_id specifically, which isn't secret and doesn't have
any power unless it's somehow copied into the ClientConnectionInfo,
I'm not sure that the bzero() gives us much. But I do see value in
clearing out, say, the Bearer token once we're finished with it.

Also in this validate() code path, I'm taking a look at the added
memory management with the pfree():
1. Should we add any more ceremony to the returned struct, to try to
ensure that the ABI matches? Or is it good enough to declare that
modules need to be compiled against a specific server version?
2. Should we split off a separate memory context to contain
allocations made by the validator?

> Looking more at the patchset I think we need to apply conditional compilation
> of the backend for oauth like how we do with other opt-in schemes in configure
> and meson.  The attached .txt has a diff for making --with-oauth a requirement
> for compiling support into backend libpq.

Do we get the flexibility we need with that approach? With other
opt-in schemes, the backend and the frontend both need some sort of
third-party dependency, but that's not true for OAuth. I could see
some people wanting to support an offline token validator on the
server side but not wanting to build the HTTP dependency into their
clients.

I was considering going in the opposite direction: With the client
hooks, a user could plug in their own implementation without ever
having to touch the built-in flow, and I'm wondering if --with-oauth
should really just be --with-builtin-oauth or similar. Then if the
server sends OAUTHBEARER, the client only complains if it doesn't have
a flow available to use, rather than checking USE_OAUTH. This kind of
ties into the other big open question of "what do we do about users
that don't want the additional overhead of something they're not
using?"

--Jacob




Re: define pg_structiszero(addr, s, r)

2024-10-28 Thread Ranier Vilela
Em seg., 28 de out. de 2024 às 12:08, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > It seems to me that [reversing the loop direction] is more optimized.
>
> That's far from clear: you're ignoring the possibility that memory
> access logic is better optimized for forward scanning than reverse
> scanning.  I'd stick with the forward scan without some extremely
> detailed testing.
>
I don't disagree.
After posting, I was wondering if the first possible is not zero, it should
probably be at the beginning and not at the end.

best regards,
Ranier Vilela


Re: POC, WIP: OR-clause support for indexes

2024-10-28 Thread Alena Rybakina

Hi, Jian! Thank you for your work on this topic!

On 28.10.2024 10:19, jian he wrote:

  * NOTE:  returns NULL if clause is an OR or AND clause; it is the
  * responsibility of higher-level routines to cope with those.
  */
static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
  RestrictInfo *rinfo,
  int indexcol,
  IndexOptInfo *index)

the above comments need a slight change.


EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand = 1
OR thousand = 3);
 QUERY PLAN
---
  Bitmap Heap Scan on tenk2
Recheck Cond: ((thousand = 1) OR (thousand = 3))
->  Bitmap Index Scan on tenk2_thous_tenthous
  Index Cond: (thousand = ANY ('{1,3}'::integer[]))

EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand in (1,3));
 QUERY PLAN
---
  Bitmap Heap Scan on tenk2
Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
->  Bitmap Index Scan on tenk2_thous_tenthous
  Index Cond: (thousand = ANY ('{1,3}'::integer[]))

tenk2 index:
Indexes:
 "tenk2_thous_tenthous" btree (thousand, tenthous)

Looking at the above cases, I found out the "Recheck Cond" is
different from "Index Cond".
I wonder why there is a difference, or if they should be the same.
then i come to:
match_orclause_to_indexcol

 /*
  * Finally, build an IndexClause based on the SAOP node. Use
  * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
  * estimations because it may differ from the estimation made for an OR
  * clause. Although it is not a lossy expression, keep the old version of
  * rinfo in iclause->rinfo to detect duplicates and recheck the original
  * clause.
  */
 iclause = makeNode(IndexClause);
 iclause->rinfo = rinfo;
 iclause->indexquals = list_make1(make_simple_restrictinfo(root,
   &saopexpr->xpr));
 iclause->lossy = false;
 iclause->indexcol = indexcol;
 iclause->indexcols = NIL;

looking at create_bitmap_scan_plan.
I think "iclause->rinfo" itself won't be able to detect duplicates.
since the upper code would mostly use "iclause->indexquals" for comparison?


typedef struct IndexClause comments says:
"
  * indexquals is a list of RestrictInfos for the directly-usable index
  * conditions associated with this IndexClause.  In the simplest case
  * it's a one-element list whose member is iclause->rinfo.  Otherwise,
  * it contains one or more directly-usable indexqual conditions extracted
  * from the given clause.  The 'lossy' flag indicates whether the
  * indexquals are semantically equivalent to the original clause, or
  * represent a weaker condition.
"
should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
since it meets the comment condition: "semantically equivalent to the
original clause"
or is the above comment slightly wrong?

in match_orclause_to_indexcol
i changed from
iclause->rinfo = rinfo;
to
  iclause->rinfo = make_simple_restrictinfo(root,
 &saopexpr->xpr);

as expected. now the "Recheck Cond" is same as "Index Cond"
Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
->  Bitmap Index Scan on tenk2_thous_tenthous
  Index Cond: (thousand = ANY ('{1,3}'::integer[]))

I am not sure of the implication of this change.

I may be wrong, but the original idea was to double-check the result 
with the original expression.


But I'm willing to agree with you. I think we should add transformed 
rinfo variable through add_predicate_to_index_quals function. I attached 
the diff file to the letter.


diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c

index 3da7ea8ed57..c68ac7008e6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3463,10 +3463,11 @@ match_orclause_to_indexcol(PlannerInfo *root,
  * rinfo in iclause->rinfo to detect duplicates and recheck the 
original

  * clause.
  */
+    RestrictInfo *rinfo_new = make_simple_restrictinfo(root,
+ &saopexpr->xpr);
 iclause = makeNode(IndexClause);
-    iclause->rinfo = rinfo;
-    iclause->indexquals = list_make1(make_simple_restrictinfo(root,
- &saopexpr->xpr));
+    iclause->rinfo = rinfo_new;
+    iclause->indexquals = add_predicate_to_index_quals(index, 
list_make1(rinfo_new));

 iclause->lossy = false;
 iclause->indexcol = indexcol;
 iclause->indexcols = NIL;


I figured out comments that you mentioned and found some addition 
explanation.


As I understand it, this processing is related to ensuring that the 
selectivity of the index is assessed correctly and that there is no 
underestimation, which can lead to the selection of a partial index in

Planner issue with BitmapScan recheck on external TOAST

2024-10-28 Thread Jim Nasby
I’ve been testing use of a BRIN index on record creation date (timestamptz) on 
a snapshot of a production system. Note that after creating the BRIN index the 
number of buffers being accessed jumps from 23838 to 191663. Based on what 
EXPLAIN is showing, I believe the issue is that the planner doesn’t understand 
that each additional row that goes through the (repsrv_account_ids(data) && …) 
recheck results in fetching at least one TOAST chunk. (I’d like to know if my 
assumption about TOAST is correct here; it’s the only thing I can think of to 
explain these block numbers from the 2nd EXPLAIN…)

   Heap Blocks: exact=11024
   Buffers: shared hit=191663
   ->  BitmapAnd  (cost=4903.00..4903.00 rows=14930 width=0) (actual 
time=74.704..74.705 rows=0 loops=1)
 Buffers: shared hit=1926


Unfortunately I haven’t been able to create an independent repo of this issue, 
so this report is based on PG 16 (most recent I can test in production). 
repsrv_account_ids() is a function that extracts a field from a JSONB document 
(the data field). create_date is timestamptz. There’s 17 other fields in the 
table that I’m omitting (I can share if needed, but would need to talk to some 
folks over here about it).

Indexes:
"task_execution_pkey" PRIMARY KEY, btree (id)
"task_execution__create_date_brin" brin (create_date)
"task_execution_create_date_idx" btree (create_date)

explain (analyze,buffers) SELECT 1  

  
FROM task_execution te  

WHERE (te.create_date 
BETWEEN '2024-7-1'::timestamptz AND '2024-9-5'::timestamptz)

  and repsrv_account_ids(te.data) && 
'{303,403,503,301,506,8805604}'::text[] 

 ;
  QUERY 
PLAN
---
 Gather  (cost=1236.73..103675.01 rows=11291 width=4) (actual 
time=11.356..41.246 rows=9303 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=23838
   ->  Parallel Bitmap Heap Scan on task_execution te  (cost=236.73..101545.91 
rows=4705 width=4) (actual time=6.659..34.198 rows=3101 loops=3)
 Recheck Cond: (repsrv_account_ids(data) && 
'{303,403,503,301,506,8805604}'::text[])
 Filter: ((create_date >= '2024-07-01 00:00:00+00'::timestamp with time 
zone) AND (create_date <= '2024-09-05 00:00:00+00'::timestamp with time zone))
 Rows Removed by Filter: 5638
 Heap Blocks: exact=14066
 Buffers: shared hit=23838
 ->  Bitmap Index Scan on task_execution__account_ids  
(cost=0.00..233.91 rows=26469 width=0) (actual time=7.304..7.304 rows=26218 
loops=1)
   Index Cond: (repsrv_account_ids(data) && 
'{303,403,503,301,506,8805604}'::text[])
   Buffers: shared hit=32
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.188 ms
 Execution Time: 41.791 ms
(17 rows)

CREATE INDEX task_execution__create_date_brin ON task_execution USING brin 
(create_date) WITH (pages_per_range=8);
CREATE INDEX

explain (analyze,buffers) SELECT 1  

  
FROM task_execution te  

WHERE (te.create_date 
BETWEEN '2024-7-1'::timestamptz AND '2024-9-5'::timestamptz)

  and repsrv_account_ids(te.data) && 
'{303,403,503,301,506,8805604}'::text[] 

 ;

 QUERY PLAN

Re: protocol-level wait-for-LSN

2024-10-28 Thread Heikki Linnakangas

On 28/10/2024 17:51, Peter Eisentraut wrote:
This is something I hacked together on the way back from pgconf.eu. It's 
highly experimental.


The idea is to do the equivalent of pg_wal_replay_wait() on the protocol 
level, so that it is ideally fully transparent to the application code. 
The application just issues queries, and they might be serviced by a 
primary or a standby, but there is always a correct ordering of reads 
after writes.


Additionally, I'm exploring whether this is an idea for a protocol 
extension that might be a bit more complex than, say, longer cancel 
keys, something we could have a discussion around protocol versioning 
around.


The patch adds a protocol extension called _pq_.wait_for_lsn as well as 
a libpq connection option wait_for_lsn to activate the same.  (Use e.g., 
psql -d 'wait_for_lsn=1'.)


With this protocol extension, two things are changed:

- The ReadyForQuery message sends back the current LSN.


+1

- The Query message sends an LSN to wait for.  (This doesn't handle the 
extended query protocol yet.)


I'd suggest adding a new message type for this, so that it works the 
same with simple and extended query. Or if you just want to wait without 
issuing any query.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: protocol-level wait-for-LSN

2024-10-28 Thread Jelte Fennema-Nio
On Mon, 28 Oct 2024 at 16:51, Peter Eisentraut  wrote:
> The idea is to do the equivalent of pg_wal_replay_wait() on the protocol
> level, so that it is ideally fully transparent to the application code.
> The application just issues queries, and they might be serviced by a
> primary or a standby, but there is always a correct ordering of reads
> after writes.

Sounds super useful. This came up in the Unconference session about
protocols on PGConf.dev too. I'll

> There might be other ways to slice this.  Instead of using a
> hypothetical middleware, the application would use two connections, one
> for writing, one for reading, and the LSN would be communicated between
> the two.  I imagine in this case, at least the one half of the protocol,
> shipping the current LSN with ReadyForQuery, could be useful, instead of
> requiring application code to issue pg_current_wal_insert_lsn() explicitly.

I think this usecase is already super useful by itself. And having
both directions would still be preferred I think.




Should we support casting between ARRAYs and JSON(B)?

2024-10-28 Thread Aleksander Alekseev
Hi hackers,

While reviewing another patch [1] I came to an idea to try something stupid:

=# select '{1,2,3}' :: int[];
  int4
-
 {1,2,3}
=# select '{1,2,3}' :: int[] :: jsonb[];
ERROR:  cannot cast type integer[] to jsonb[]

=# select '[1,2,3]' :: jsonb;
   jsonb
---
 [1, 2, 3]
=# select '[1,2,3]' :: jsonb :: int[];
ERROR:  cannot cast type jsonb to integer[]

Does anyone believe that this should work and/or would be convenient
if it worked? I can imagine cases when one would like to use array_*
functions for JSON(B) although personally I didn't encounter such a
case (yet?).

Thoughts?

[1]: 
https://postgr.es/m/CAEG8a3J41a4dpw_-F94fF-JPRXYxw-GfsgoGotKcjs9LVfEEvw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: New "raw" COPY format

2024-10-28 Thread Masahiko Sawada
On Mon, Oct 28, 2024 at 3:21 AM Joel Jacobson  wrote:
>
> On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote:
> > On Mon, Oct 28, 2024, at 08:56, jian he wrote:
> >>   /* Check force_quote */
> >> - if (!opts_out->csv_mode && (opts_out->force_quote ||
> >> opts_out->force_quote_all))
> >> + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
> >> + opts_out->force_quote_all))
> >>   ereport(ERROR,
> >>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >>   /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
> >>
> >> maybe this has a code indentation issue.
> >> since "if" and "opts_out" in the same column position.
> >
> > Thanks for review.
> >
> > I've fixed the indentation issues.
>
> I've now installed pgindent, and will use it from hereon, to avoid this class 
> of problems.
>
> New version where all three patches are now indented using pgindent.

Thank you for updating the patch. Here are review comments on the v15
0002 patch:

When testing the patch with an empty delimiter, I got the following failure:

postgres(1:903898)=# copy hoge from '/tmp/tmp.raw' with (format 'raw',
delimiter '');
TRAP: failed Assert("delim_len > 0"), File: "copyfromparse.c", Line:
1173, PID: 903898

---
-   else
+   else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+   else
+   {
+   elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+   pg_unreachable();
+   }

Since we already check the incompatible options with COPY_FORMAT_RAW
and default_print, I think it's better to add an assertion to make
sure the format is either COPY_FORMAT_CSV or COPY_FORMAT_TEXT, instead
of using elog(ERROR).

---
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)

This function has a lot of duplication with CopyReadLineText(). I
think it's better to modify CopyReadLineText() to support 'raw'
format, rather than adding a separate function.

---
+   boolread_entire_file = (cstate->opts.delim == NULL);
+   int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;

I think we can use 'delim_len == 0' instead of read_entire_file.

---
+   if (read_entire_file)
+   {
+   /* Continue until EOF if reading entire file */
+   input_buf_ptr++;
+   continue;
+   }

In the case where we're reading the entire file as a single tuple, we
don't need to advance the input_buf_ptr one by one. Instead,
input_buf_ptr can jump to copy_buf_len, which is faster.

---
+   /* Check for delimiter, possibly multi-byte */
+   IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+   if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+   delim_len) == 0)
+   {
+   cstate->eol_type = EOL_CUSTOM;
+   input_buf_ptr += delim_len;
+   break;
+   }
+   input_buf_ptr++;

Similar to the above comment, I think we don't need to check the char
one by one. I guess that it would be faster if we locate the delimiter
string in the intput_buf (e.g. using strstr()), and then move
input_buf_ptr to the detected position.

---
+   /* Copy the entire line into attribute_buf */
+   memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+  cstate->line_buf.len);
+   cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+   cstate->attribute_buf.len = cstate->line_buf.len;

The CopyReadAttributesRaw() just copies line_buf data to
attirbute_buf, which seems to be a waste. I think we can have
attribute_buf point to the line_buf. That way, we can skip the whole
step 4 that is described in the comment on top o f copyfromparse.c:

* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
*1.  2.3.   4.

---
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+   /* Enforce single column requirement */
+   if (cstate->max_fields != 1)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("COPY with format 'raw' must specify exactly
one column")));
+   }

This check should have already been done in BeginCopyFrom(). Is there
any case where max_fields gets to != 1 during reading the input?

---
It's a bit odd to me to use the delimiter as a EOL marker in raw
format, but probably it's okay.

---
-   if (cstate->opts.format != COPY_FORMAT_BINARY)
+   if (cstate->opts.format == COPY_FORMAT_RAW &&
+   cstate->opts.delim != NULL)
+   {
+   /* Output the user-specified delimiter between rows */
+   CopySendString(cstate, cstate->opts.delim);
+   }
+   else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+cstate->opts.format == COPY_FORMAT_CSV)

Since it sends the delimiter 

Re: Inconsistent output handling in 002_pg_upgrade.pl test logs

2024-10-28 Thread Daniel Gustafsson
> On 28 Oct 2024, at 13:38, Joel Jacobson  wrote:
> 
> Hi hackers,
> 
> I've noticed some inconsistency in 002_pg_upgrade.pl in how it handles output
> during test failures. Currently, it uses note to print the header:
> 
>note "=== contents of $log ===\n";
> 
> but print for the log content and footer:
> 
>print slurp_file($log);
>print "=== EOF ===\n";

Ugh, nice catch.

> Option 1: Keep output together using note

> Option 2: Adjust header message for separate logs

> Thoughts on these options?

I would prefer to output this to the log only and not the TAP output, to avoid
the risk of not seeing the test output for all the log output on the screen.

--
Daniel Gustafsson





Re: Avoid possible overflow (src/port/bsearch_arg.c)

2024-10-28 Thread Ranier Vilela
Em seg., 28 de out. de 2024 às 09:13, Heikki Linnakangas 
escreveu:

> On 09/10/2024 19:16, Ranier Vilela wrote:
> > Em ter., 8 de out. de 2024 às 18:28, Nathan Bossart
> > mailto:nathandboss...@gmail.com>> escreveu:
> >
> > On Tue, Oct 08, 2024 at 04:09:00PM -0300, Ranier Vilela wrote:
> >  > The port function *bsearch_arg* mimics the C function
> >  > *bsearch*.
> >  >
> >  > The API signature is:
> >  > void *
> >  > bsearch_arg(const void *key, const void *base0,
> >  > size_t nmemb, size_t size,
> >  > int (*compar) (const void *, const void *, void *),
> >  > void *arg)
> >  >
> >  > So, the parameter *nmemb* is size_t.
> >  > Therefore, a call with nmemb greater than INT_MAX is possible.
> >  >
> >  > Internally the code uses the *int* type to iterate through the
> > number of
> >  > members, which makes overflow possible.
> >
> > I traced this back to commit bfa2cee (v14), which both moved
> > bsearch_arg()
> > to its current location and adjusted the style a bit.  Your patch
> looks
> > reasonable to me.
> >
> > Thanks for looking.
>
> Committed, thanks.
>
Thank you.


>
> Based on the original discussion on bfa2cee, I couldn't figure out where
> exactly this new bsearch implementation originated from, but googling
> around, probably *BSD or libiberty. Tomas, do you remember? Not that it
> matters, but I'm curious.
>
> Some of those other implementations have fixed this, others have not.
> And they all seem to also have the "involes" typo in the comment that we
> fixed in commit 7ef8b52cf07 :-). Ranier, you might want to submit this
> fix to those other projects too.
>
Sure, I can try.

best regards,
Ranier Vilela


Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Shubham Khanna
On Mon, Oct 28, 2024 at 8:47 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I resumed reviewing the patch set.
> Here are only cosmetic comments as my rehabilitation.
>
> 01. getPublications()
>
> I feel we could follow the notation like getSubscriptions(), because number of
> parameters became larger. How do you feel like attached?
>

I will handle this comment in a later set of patches.

> 02. fetch_remote_table_info()
>
> ```
>   "SELECT DISTINCT"
> - "  (CASE WHEN (array_length(gpt.attrs, 1) = 
> c.relnatts)"
> - "   THEN NULL ELSE gpt.attrs END)"
> + "  (gpt.attrs)"
> ```
>
> I think no need to separate lines and add bracket. How about like below?
>
> ```
>  "SELECT DISTINCT gpt.attrs"
> ```
>
Fixed this.

The v44 version patches attached at [1] have the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CAHv8RjLvr8ZxX-1TcaxrZns1nwgrVUTO_2jhDdOPys0WgrDyKQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Alias of VALUES RTE in explain plan

2024-10-28 Thread Ashutosh Bapat
On Mon, Oct 28, 2024 at 10:17 AM Andrei Lepikhov  wrote:
>
> On 10/28/24 03:15, Yasir wrote:
> > By design of my solution, I was not taking it as a bug. But now, I agree
> > with your opinion.
> I think the case provided by Ashutosh was initially correct, and nothing
> needs to change. Look at the similar case:
>
> EXPLAIN SELECT x,y FROM (
>SELECT oid,relname FROM pg_class WHERE relname = 'pg_index') AS
> c(x,y) WHERE c.y = 'pg_index';
>
>   QUERY PLAN
>
> 
>   Index Scan using pg_class_relname_nsp_index on pg_class
> (cost=0.27..8.29 rows=1 width=68)
> Index Cond: (relname = 'pg_index'::name)
> (2 rows)
>
> I don't see any reference to the alias c(x,y) in this explain.
> Similarly, the flattened VALUES clause shouldn't be referenced under the
> alias t(a,b).

The reason you don't see c(x, y) is because the subquery gets pulled
up and the subquery with c(x, y) no longer exists. If the subquery
doesn't get pulled, you would see c(x, y) in the EXPLAIN plan.

Our syntax doesn't allow an alias to be attached to VALUES(). E.g.
select * from values (1), (2) x(a) is not allowed. Instead we allow
(values (1), (2)) x(a) where values (1), (2) is treated as a subquery.
Since there is no way to attach an alias to VALUES() itself, I think
it's fair to consider the outer alias as the alias of the VALUES
relation. That's what Tom's patch does. The result is useful as well.

The patch looks good to me, except the name of the new member.

  CommonTableExpr *p_parent_cte; /* this query's containing CTE */
+ Alias*p_parent_alias; /* parent's alias for this query */

the two "parent"s here mean different things and that might lead one
to assume that the p_parent_alias refers to alias of CTE. The comment
adds to the confusion since it mentions parent. How about renaming it
as p_outer_alias? or something which indicates alias of the outer
query?

-- 
Best Wishes,
Ashutosh Bapat




Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-10-28 Thread Daniel Gustafsson
> On 25 Oct 2024, at 20:22, Jacob Champion  
> wrote:

> I have combed almost all of Daniel's feedback backwards into the main
> patch (just the new bzero code remains, with the open question
> upthread),

Re-reading I can't see a vector there, I guess I am just scarred from what
seemed to be harmless leaks in auth codepaths and treat every bit as
potentially important.  Feel free to drop from the patchset for now.

> Next up is, hopefully, url-encoding. I hadn't realized what an
> absolute mess that would be [1].

Everything and anything involving urls is a hot mess =/

Looking more at the patchset I think we need to apply conditional compilation
of the backend for oauth like how we do with other opt-in schemes in configure
and meson.  The attached .txt has a diff for making --with-oauth a requirement
for compiling support into backend libpq.

--
Daniel Gustafsson

diff --git a/configure b/configure
index 39fe5a0542..b40cd836f1 100755
--- a/configure
+++ b/configure
@@ -8439,9 +8439,6 @@ fi
 
 if test x"$with_oauth" = x"curl"; then
 
-$as_echo "#define USE_OAUTH 1" >>confdefs.h
-
-
 $as_echo "#define USE_OAUTH_CURL 1" >>confdefs.h
 
   # OAuth requires python for testing
diff --git a/configure.ac b/configure.ac
index f7e1400b6e..82217e652e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -927,8 +927,7 @@ if test x"$with_oauth" = x"" ; then
 fi
 
 if test x"$with_oauth" = x"curl"; then
-  AC_DEFINE([USE_OAUTH], 1, [Define to 1 to build with OAuth 2.0 support. 
(--with-oauth)])
-  AC_DEFINE([USE_OAUTH_CURL], 1, [Define to 1 to use libcurl for OAuth 
support.])
+  AC_DEFINE([USE_OAUTH_CURL], 1, [Define to 1 to build with OAuth 2.0 support. 
(--with-oauth=curl)])
   # OAuth requires python for testing
   if test "$with_python" != yes; then
 AC_MSG_WARN([*** OAuth support tests requires --with-python to run])
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 98eb2a8242..ba502b0442 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -15,7 +15,6 @@ include $(top_builddir)/src/Makefile.global
 # be-fsstubs is here for historical reasons, probably belongs elsewhere
 
 OBJS = \
-   auth-oauth.o \
auth-sasl.o \
auth-scram.o \
auth.o \
@@ -30,6 +29,10 @@ OBJS = \
pqmq.o \
pqsignal.o
 
+ifneq ($(with_oauth),)
+OBJS += auth-oauth.o
+endif
+
 ifeq ($(with_ssl),openssl)
 OBJS += be-secure-openssl.o
 endif
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 0cf3e31c9f..57638f922e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -29,7 +29,6 @@
 #include "libpq/auth.h"
 #include "libpq/crypt.h"
 #include "libpq/libpq.h"
-#include "libpq/oauth.h"
 #include "libpq/pqformat.h"
 #include "libpq/sasl.h"
 #include "libpq/scram.h"
@@ -201,6 +200,14 @@ static int CheckRADIUSAuth(Port *port);
 static int PerformRadiusTransaction(const char *server, const char 
*secret, const char *portstr, const char *identifier, const char *user_name, 
const char *passwd);
 
 
+/*
+ * OAuth Authentication
+ *
+ */
+#ifdef USE_OAUTH
+#include "libpq/oauth.h"
+#endif
+
 /*
  * Global authentication functions
  *
@@ -614,8 +621,13 @@ ClientAuthentication(Port *port)
case uaTrust:
status = STATUS_OK;
break;
+
case uaOAuth:
+#ifdef USE_OAUTH
status = CheckSASLAuth(&pg_be_oauth_mech, port, NULL, 
NULL);
+#else
+   Assert(false);
+#endif
break;
}
 
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index c623b8463d..719c7a881e 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1749,7 +1749,11 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
else if (strcmp(token->string, "radius") == 0)
parsedline->auth_method = uaRADIUS;
else if (strcmp(token->string, "oauth") == 0)
+#ifdef USE_OAUTH
parsedline->auth_method = uaOAuth;
+#else
+   unsupauth = "oauth";
+#endif
else
{
ereport(elevel,
diff --git a/src/backend/libpq/meson.build b/src/backend/libpq/meson.build
index c85527fb01..1c76dd80cc 100644
--- a/src/backend/libpq/meson.build
+++ b/src/backend/libpq/meson.build
@@ -1,7 +1,6 @@
 # Copyright (c) 2022-2024, PostgreSQL Global Development Group
 
 backend_sources += files(
-  'auth-oauth.c',
   'auth-sasl.c',
   'auth-scram.c',
   'auth.c',
@@ -17,6 +16,10 @@ backend_sources += files(
   'pqsignal.c',
 )
 
+if oauth.found()
+  backend_sources += files('auth-oauth.c')
+endif
+
 if ssl.found()
   backend_sources += files('be-secure-openssl.c')
 endif
diff --git a/src/include/pg_config.h.i

Re: Inconsistent output handling in 002_pg_upgrade.pl test logs

2024-10-28 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 13:44, Daniel Gustafsson wrote:
>> On 28 Oct 2024, at 13:38, Joel Jacobson  wrote:
>> 
>> Hi hackers,
>> 
>> I've noticed some inconsistency in 002_pg_upgrade.pl in how it handles output
>> during test failures. Currently, it uses note to print the header:
>> 
>>note "=== contents of $log ===\n";
>> 
>> but print for the log content and footer:
>> 
>>print slurp_file($log);
>>print "=== EOF ===\n";
>
> Ugh, nice catch.
>
>> Option 1: Keep output together using note
>
>> Option 2: Adjust header message for separate logs
>
>> Thoughts on these options?
>
> I would prefer to output this to the log only and not the TAP output, to avoid
> the risk of not seeing the test output for all the log output on the screen.

I also think that's best, and in line with what we do in other parts of the 
same script.

Patch attached.

/Joel

0001-Fix-inconsistent-output-handling-in-002_pg_upgrade.p.patch
Description: Binary data


protocol-level wait-for-LSN

2024-10-28 Thread Peter Eisentraut
This is something I hacked together on the way back from pgconf.eu. 
It's highly experimental.


The idea is to do the equivalent of pg_wal_replay_wait() on the protocol 
level, so that it is ideally fully transparent to the application code. 
The application just issues queries, and they might be serviced by a 
primary or a standby, but there is always a correct ordering of reads 
after writes.


Additionally, I'm exploring whether this is an idea for a protocol 
extension that might be a bit more complex than, say, longer cancel 
keys, something we could have a discussion around protocol versioning 
around.


The patch adds a protocol extension called _pq_.wait_for_lsn as well as 
a libpq connection option wait_for_lsn to activate the same.  (Use e.g., 
psql -d 'wait_for_lsn=1'.)


With this protocol extension, two things are changed:

- The ReadyForQuery message sends back the current LSN.

- The Query message sends an LSN to wait for.  (This doesn't handle the 
extended query protocol yet.)


To make any real use of this, you'd need some middleware, like a hacked 
pgbouncer, that transparently redirects queries among primaries and 
standbys, which doesn't exist yet.  But if it did, I imagine it could be 
pretty useful.


There might be other ways to slice this.  Instead of using a 
hypothetical middleware, the application would use two connections, one 
for writing, one for reading, and the LSN would be communicated between 
the two.  I imagine in this case, at least the one half of the protocol, 
shipping the current LSN with ReadyForQuery, could be useful, instead of 
requiring application code to issue pg_current_wal_insert_lsn() explicitly.


Thoughts?From 44b6354429847e3b3aeac21ee5712879b97d7877 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 26 Oct 2024 01:11:57 +0200
Subject: [PATCH v0] wait_for_lsn protocol option

---
 src/backend/tcop/backend_startup.c  | 15 +--
 src/backend/tcop/dest.c | 12 
 src/backend/tcop/postgres.c | 23 +++
 src/include/libpq/libpq-be.h|  1 +
 src/interfaces/libpq/fe-connect.c   | 26 ++
 src/interfaces/libpq/fe-exec.c  |  1 +
 src/interfaces/libpq/fe-protocol3.c | 20 
 src/interfaces/libpq/fe-trace.c |  2 ++
 src/interfaces/libpq/libpq-int.h|  3 +++
 9 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/src/backend/tcop/backend_startup.c 
b/src/backend/tcop/backend_startup.c
index 2a96c81f925..bd3b91d01eb 100644
--- a/src/backend/tcop/backend_startup.c
+++ b/src/backend/tcop/backend_startup.c
@@ -768,12 +768,23 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool 
gss_done)
valptr),
 errhint("Valid values 
are: \"false\", 0, \"true\", 1, \"database\".")));
}
+   else if (strcmp(nameptr, "_pq_.wait_for_lsn") == 0)
+   {
+   if (strcmp(valptr, "1") == 0)
+   port->wait_for_lsn_enabled = true;
+   else
+   ereport(FATAL,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("invalid value 
for parameter \"%s\": \"%s\"",
+   
"wait_for_lsn",
+   valptr),
+errhint("Valid values 
are: 1.")));
+   }
else if (strncmp(nameptr, "_pq_.", 5) == 0)
{
/*
 * Any option beginning with _pq_. is reserved 
for use as a
-* protocol-level option, but at present no 
such options are
-* defined.
+* protocol-level option.
 */
unrecognized_protocol_options =
lappend(unrecognized_protocol_options, 
pstrdup(nameptr));
diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c
index 96f80b30463..bb9910b12d5 100644
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -31,6 +31,8 @@
 #include "access/printsimple.h"
 #include "access/printtup.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
 #include "commands/copy.h"
 #include "commands/createas.h"
 #include "commands/explain.h"
@@ -40,6 +42,7 @@
 #include "executor/tstoreReceiver.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
+#include "miscadmin.h"
 
 
 /* 
@@ -265,6 +268,15 @@ Re

Re: protocol-level wait-for-LSN

2024-10-28 Thread Jelte Fennema-Nio
On Mon, 28 Oct 2024 at 17:58, Heikki Linnakangas  wrote:
> > - The Query message sends an LSN to wait for.  (This doesn't handle the
> > extended query protocol yet.)
>
> I'd suggest adding a new message type for this, so that it works the
> same with simple and extended query. Or if you just want to wait without
> issuing any query.

I imagine a libpq interface like this.

lsn = PQcurrentLSN(primaryConn)
PQsendWaitLSN(secondaryConn, lsn)
PQsendQuery(secondaryConn, ...)

One thing I'm wondering is if the current lsn could be a read-only GUC
that is reported through ParameterStatus. Because a downside of making
it part of ReadyForQuery is that you only get a ReadyForQuery at the
end of a pipeline, while a pipeline can contain multiple commits if
you use explicit BEGIN/COMMIT in your pipeline. It might be nice to be
able to wait on those commits before you've received ReadyForQuery. On
the other hand, that seems like a rather exotic usecase that maybe is
not worth thinking about too much.




Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-10-28 Thread Jingtang Zhang
Oh, another comments for v24-0001 patch: we are in heam AM now, should we use
something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option,
just like other heap AM options do?

> + if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
> + istate->bistate = GetBulkInsertState();


—

Regards, Jingtang





Re: [PATCH] Add sortsupport for range types and btree_gist

2024-10-28 Thread Bernd Helmle
Am Samstag, dem 26.10.2024 um 11:40 +0300 schrieb Andrey M. Borodin:
> First patch:
> 0. We have PG_FREE_IF_COPY(), does it suits your needs?

PG_FREE_IF_COPY() requires FunctionCallInfo (the macro uses
PG_GETARG_POINTER() to reference the pointer value to compare), which
is not usable in the context we need it.

> 1. Tests do not check what actual build method is used. You can add
> INJECTION_POINT("gist-sort-build") and expect a notice there or
> something like that.

Hmm good idea, i give it a try. 

> 2. "Per default" -> "By default", "indexe" -> "index", "index
> quality" -> NULL (in 14 quality was bad, since 15 same "quality")
> 
> Second patch:
> 0. PG_FREE_IF_COPY in range_gist_cmp? :)

See above.

> 1. Some test of this new functionality?

Right, but i am unsure where this belongs to. Perhaps to
src/test/regress/sql/rangetypes.sql ? We can extend the gist tests
there.




Re: Should we support casting between ARRAYs and JSON(B)?

2024-10-28 Thread Marcos Pegoraro
Em seg., 28 de out. de 2024 às 14:06, Aleksander Alekseev <
aleksan...@timescale.com> escreveu:

> =# select '[1,2,3]' :: jsonb :: int[];
>

I think would be useful, cast int[] to json is not hard

select to_json('{1,5,9,12}'::int[]);

but json array to int[] is not that easy.

select js,
   js->'items',
   translate(js->>'items','[]','{}')::int[],
   5 = any(translate(js->>'items','[]','{}')::int[])

   --This one would be cool, doesn't need translate or any other trick
   --5 = any(js->'items'::int[])

from (select jsonb_build_object('items','{1,5,9,12}'::int[])) x(js);

So, if you cast one way, you can do both ways.

regards
Marcos


Re: general purpose array_sort

2024-10-28 Thread Aleksander Alekseev
Hi,

> Based on the previous discussion, I split it into two patches in V8.
>
> 0001 is the general sort part without `is_ascending` or `nulls_first`,
> the sort order is determined by the "<" operator of the element type.
> It also cached the type entry of both eletyp and the corresponding
> array type.
>
> 0002 adds the `is_ascending` and `nulls_first` part, it now uses
> two boolean parameters instead of parsing one text parameter.

Thanks for the update patch set. Here are some comments.

0001:

> +{ oid => '8810', descr => 'sort array',
> +  proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray',
> +  proargtypes => 'anyarray', prosrc => 'array_sort'},

I would expect that array_sort() should be IMMUTABLE. Is there a
reason for it to be VOLATILE?

> +array_sort ( anyarray  
> COLLATE collation_name )
> +anyarray

It seems to me that the part about using COLLATE should be moved
below, to the description / examples section, since it's not part of
the function signature.

Also the description should be more specific about how NULLs are
sorted. NULLs also should be covered by tests.

0002:

> is_ascending

I really believe this name is not the best one. I suggest using
`reverse => true`. `nulls_first` is OK.

> +Datum
> +array_sort_order(PG_FUNCTION_ARGS)
> +{
> +return array_sort(fcinfo);
> +}
> +
> +Datum
> +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> +{
> +return array_sort(fcinfo);
> +}

Any reason not to specify array_sort in pg_proc.dat?

The tests cover is_ascending => true | false, which is OK, but only
(is_ascending = true, nulls_first => true) and (is_ascending => false,
nulls_fist => false). For the case when both optional arguments are
specified you have to test at least 4 combinations.

-- 
Best regards,
Aleksander Alekseev




Re: Avoid possible overflow (src/port/bsearch_arg.c)

2024-10-28 Thread Heikki Linnakangas

On 28/10/2024 17:30, Tomas Vondra wrote:

Some of those other implementations have fixed this, others have not.
And they all seem to also have the "involes" typo in the comment that we
fixed in commit 7ef8b52cf07 :-). Ranier, you might want to submit this
fix to those other projects too.


Thanks for fixing this, although I wonder if we can actually hit this,
as we don't really allocate more than 1GB in most places. But it's
possible, and the pre-bfa2cee code handled it fine.


Yeah, I didn't check closely I'm pretty sure none of the current 
callsites can pass anything near INT_MAX elements.


While we're at it, there's this in dicts/spell.h:


/*
 * Structure to store Hunspell options. Flag representation depends on flag
 * type. These flags are about support of compound words.
 */
typedef struct CompoundAffixFlag
{
union
{
/* Flag name if flagMode is FM_CHAR or FM_LONG */
const char *s;
/* Flag name if flagMode is FM_NUM */
uint32  i;
}   flag;
/* we don't have a bsearch_arg version, so, copy FlagMode */
FlagModeflagMode;
uint32  value;
} CompoundAffixFlag;


We have bsearch_arg() now, so we could switch to that and remove 
'flagMode' from here.


--
Heikki Linnakangas
Neon (https://neon.tech)





Inconsistent output handling in 002_pg_upgrade.pl test logs

2024-10-28 Thread Joel Jacobson
Hi hackers,

I've noticed some inconsistency in 002_pg_upgrade.pl in how it handles output
during test failures. Currently, it uses note to print the header:

note "=== contents of $log ===\n";

but print for the log content and footer:

print slurp_file($log);
print "=== EOF ===\n";

This results in the header appearing in the TAP test output, while the actual
log content and the EOF-footer are directed to
./build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade

This split means the log content is separate from its header and footer,
making it harder to interpret during test failures.

Example of current behavior:

In the console output:
# executing test in ./build/testrun/pg_upgrade/002_pg_upgrade group pg_upgrade 
test 002_pg_upgrade
...omitted...
not ok 15 - pg_upgrade_output.d/ removed after pg_upgrade success
# === contents of 
./build/testrun/pg_upgrade/002_pg_upgrade/data/t_002_pg_upgrade_new_node_data/pgdata/pg_upgrade_output.d/20241028T140145.267/log/pg_upgrade_dump_5.log
 ===
# === contents of 
./build/testrun/pg_upgrade/002_pg_upgrade/data/t_002_pg_upgrade_new_node_data/pgdata/pg_upgrade_output.d/20241028T140145.267/log/pg_upgrade_dump_1.log
 ===

But the actual log content and footers are only visible in 
regress_log_002_pg_upgrade.

Suggested solutions:

Option 1: Keep output together using note

If we want all messages (header, content, and footer) to be in the TAP output
for easier readability, we could replace print with note for both the content
and footer:

   foreach my $log (@log_files)
   {
   note "=== contents of $log ===\n";
   note slurp_file($log);
   note "=== EOF ===\n";
   }

Option 2: Adjust header message for separate logs

If we intentionally want the log content to remain in the separate regression
log for brevity, we could clarify the header message to indicate where
the actual content can be found, and ensure the "=== contents of $log ===\n"
message goes into the same file as the content:

   foreach my $log (@log_files)
   {
   note "=== contents of $log logged to $path_to_regress_log_002_pg_upgrade 
===\n";
   print "=== contents of $log ===\n";
   print slurp_file($log);
   print "=== EOF ===\n";
   }

Thoughts on these options?

/Joel




Re: Vacuum statistics

2024-10-28 Thread Alexander Korotkov
On Sun, Sep 29, 2024 at 12:22 AM Alena Rybakina
 wrote:
> Hi! Thank you for your interesting for this patch!
>
> I took a very brief look at this and was wondering if it was worth
> having a way to make the per-table vacuum statistics opt-in (like a
> table storage parameter) in order to decrease the shared memory
> footprint of storing the stats.
>
> I'm not sure how users can select tables that enable vacuum statistics
> as I think they basically want to have statistics for all tables, but
> I see your point. Since the size of PgStat_TableCounts approximately
> tripled by this patch (112 bytes to 320 bytes), it might be worth
> considering ways to reduce the number of entries or reducing the size
> of vacuum statistics.
>
> The main purpose of these statistics is to see abnormal behavior of vacuum in 
> relation to a table or the database as a whole.
>
> For example, there may be a situation where vacuum has started to run more 
> often and spends a lot of resources on processing a certain index, but the 
> size of the index does not change significantly. Moreover, the table in which 
> this index is located can be much smaller in size. This may be because the 
> index is bloated and needs to be reindexed.
>
> This is exactly what vacuum statistics can show - we will see that compared 
> to other objects, vacuum processed more blocks and spent more time on this 
> index.
>
> Perhaps the vacuum parameters for the index should be set more aggressively 
> to avoid this in the future.
>
> I suppose that if we turn off statistics collection for a certain object, we 
> can miss it. In addition, the user may not enable the parameter for the 
> object in time, because he will forget about it.

I agree with this point.  Additionally, in order to benefit from
gatherting vacuum statistics only for some relations in terms of
space, we need to handle variable-size stat entries.  That would
greatly increase the complexity.

> As for the second option, now I cannot say which statistics can be removed, 
> to be honest. So far, they all seem necessary.

Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
tripled in space.  That a huge change from having no statistics on
vacuum to have it in much more detail than everything else we
currently have.  I think the feasible way might be to introduce some
most demanded statistics first then see how it goes.

--
Regards,
Alexander Korotkov
Supabase




Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-28 Thread Alexander Lakhin

28.10.2024 19:06, Tom Lane wrote:

I've also dumped buf in read_whole_file() and found that in both
PG_BINARY_R and "r" modes the 0d 0a ending is preserved. But it changed
to 0a with the "rt" mode (see [1]), and it makes the test (and the whole
`meson test`) pass for me.

Interesting.  I believe we decided years ago that we didn't need to
use "rt" mode because that was the default on Windows, but was that
a misreading of the documentation?  The link you provided doesn't
give any hint that there are more than two behaviors.

However ... the link you provided also mentions that text mode
includes treating control-Z as EOF; which I'd forgotten, but it
does make it less safe than I thought to use text mode for
reading script files.


I think that this other behavior can be explained by pgwin32_fopen()/
pgwin32_open() coding (O_TEXT assumed implicitly only #ifdef FRONTEND).

Anyway, as you noticed, \x1A injected into test_extsql really leads to
the file contents truncation on read (with "rt"), so I agree that using the
text/translation mode here is not an improvement.


What I'm now thinking is that we should revert 924e03917 after
all (that is, go back to using PG_BINARY_R) and instead make
read_whole_file manually squash \r\n to \n if we're on Windows.
Ugly, but I have yet to find anything about that platform that
isn't.


Yes, I think this should help.

Best regards,
Alexander




Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-28 Thread Tender Wang
Alvaro Herrera  于2024年10月28日周一 17:16写道:

> On 2024-Oct-27, Tender Wang wrote:
>
> > BTW, while reviewing the v2 patch, I found the parentConTup in
> > foreach(cell, fks) block
> > didn't need it anymore. We can remove the related codes.
>
> True.  Done so in this v3.
>
> I noticed another problem here: we're grabbing the wrong lock type on
> the referenced rel (AccessShareLock) during detach.  (What's more: we
> release it afterwards, which is the wrong thing to do.  We need to keep
> such locks until end of transaction).  I didn't try to construct a case
> where this would be a problem, but if I change AccessShare to NoLock,
> the assertion that says we don't hold _any_ lock on that relation fires,
> which means that we're not taking any locks on those rels before this
> point.  So this lock strength choice is definitely wrong.  I changed it
> to ShareRowExclusive, which is what we're suppose to use when adding a
> trigger.


In CloneFKReferencing(), the constrForm->confrelid uses the same lock type.
I think you're right. I don't find any other problem.


-- 
Thanks,
Tender Wang


RE: Pgoutput not capturing the generated columns

2024-10-28 Thread Zhijie Hou (Fujitsu)
On Monday, October 28, 2024 1:34 PM Amit Kapila  wrote:
> 
> On Mon, Oct 28, 2024 at 7:43 AM Peter Smith 
> wrote:
> 
> >
> > 4. pgoutput_column_list_init
> >
> > - if (att->attisdropped || att->attgenerated)
> > + if (att->attisdropped)
> >   continue;
> >
> > + if (att->attgenerated)
> > + {
> > + if (bms_is_member(att->attnum, cols)) gencolpresent = true;
> > +
> > + continue;
> > + }
> > +
> >   nliveatts++;
> >   }
> >
> >   /*
> > - * If column list includes all the columns of the table,
> > - * set it to NULL.
> > + * If column list includes all the columns of the table
> > + * and there are no generated columns, set it to NULL.
> >   */
> > - if (bms_num_members(cols) == nliveatts)
> > + if (bms_num_members(cols) == nliveatts && !gencolpresent)
> >   {
> >
> > Something seems not quite right (or maybe redundant) with this logic.
> > For example, because you unconditionally 'continue' for generated
> > columns, then AFAICT it is just not possible for bms_num_members(cols)
> > == nliveatts and at the same time 'gencolpresent' to be true. So you
> > could've just Asserted (!gencolpresent) instead of checking it in the
> > condition and mentioning it in the comment.

I think it's possible for the condition you mentioned to happen.

For example:
 
CREATE TABLE test_mix_4 (a int primary key, b int, d int GENERATED ALWAYS AS (a 
+ 1) STORED);
CREATE PUBLICATION pub FOR TABLE test_mix_4(a, d);

> >
> 
> It seems part of the logic is redundant. I propose to change something along 
> the
> lines of the attached. I haven't tested the attached change as it shows how we
> can improve this part of code.

Thanks for the changes. I tried and faced an unexpected behavior
that the walsender will report Error "cannot use different column lists fo.."
in the following case:

Pub:
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int GENERATED 
ALWAYS AS (a + 1) STORED);
ALTER TABLE test_mix_4 DROP COLUMN c;
CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
Sub:
CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION 
pub_mix_7, pub_mix_8;

The pub_mix_7 publishes column a,b which should be converted to NULL
in pgoutput, but was not due to the check of att_gen_present.

Based on above, I feel we can keep the original code as it is.

Best Regards,
Hou zj


Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-10-28 Thread Daniel Gustafsson
> On 17 Oct 2024, at 00:35, Bruce Momjian  wrote:
> On Tue, Jul 16, 2024 at 08:23:11AM -0400, Andrew Dunstan wrote:

>> See  https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net
>> 
>> I agree we should fix the docco.
> 
> I have written the attached patch to make these changes.

+1 for applying backpatched to at least 17 but possibly further down judging by
the linked threads.

--
Daniel Gustafsson





Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Shubham Khanna
On Mon, Oct 28, 2024 at 7:43 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v43-0001.
>
> ==
>
> 1. Missing docs update?
>
> The CREATE PUBLICATION docs currently says:
> When a column list is specified, only the named columns are
> replicated. If no column list is specified, all columns of the table
> are replicated through this publication, including any columns added
> later.
>
> ~
>
> For this patch, should that be updated to say "... all columns (except
> generated columns) of the table are replicated..."
>
> ==
> src/backend/replication/logical/proto.c
>
> 2.
> +static bool
> +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (att->attgenerated && !columns)
> + return false;
> +
> + if (!column_in_column_list(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> Here, I wanted to suggest that the whole "Skip publishing generated
> columns" if-part is unnecessary because the next check
> (!column_in_column_list) is going to return false for the same
> scenario anyhow.
>
> But, unfortunately, the "column_in_column_list" function has some
> special NULL handling logic in it; this means none of this code is
> quite what it seems to be (e.g. the function name
> column_in_column_list is somewhat misleading)
>
> IMO it would be better to change the column_in_column_list signature
> -- add another boolean param to say if a NULL column list is allowed
> or not. That will remove any subtle behaviour and then you can remove
> the "if (att->attgenerated && !columns)" part.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 3. send_relation_and_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (att->atttypid < FirstGenbkiObjectId)
>   continue;
>
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (att->attgenerated && !columns)
> + continue;
> +
>   /* Skip this attribute if it's not present in the column list */
>   if (columns != NULL && !bms_is_member(att->attnum, columns))
>   continue;
> ~
>
> Most of that code above looks to be doing the very same thing as the
> new 'should_publish_column' in proto.c. Won't it be better to expose
> the other function and share the common logic?
>
> ~~~
>
> 4. pgoutput_column_list_init
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> + if (att->attgenerated)
> + {
> + if (bms_is_member(att->attnum, cols))
> + gencolpresent = true;
> +
> + continue;
> + }
> +
>   nliveatts++;
>   }
>
>   /*
> - * If column list includes all the columns of the table,
> - * set it to NULL.
> + * If column list includes all the columns of the table
> + * and there are no generated columns, set it to NULL.
>   */
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts && !gencolpresent)
>   {
>
> Something seems not quite right (or maybe redundant) with this logic.
> For example, because you unconditionally 'continue' for generated
> columns, then AFAICT it is just not possible for bms_num_members(cols)
> == nliveatts and at the same time 'gencolpresent' to be true. So you
> could've just Asserted (!gencolpresent) instead of checking it in the
> condition and mentioning it in the comment.
>
> ==
> src/test/regress/expected/publication.out
>
> 5.
> --- error: generated column "d" can't be in list
> +-- ok: generated column "d" can be in the list too
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> -ERROR:  cannot use generated column "d" in publication column list
>
> By allowing the above to work without giving ERROR, I think you've
> broken many subsequent test expected results. e.g. I don't trust these
> "expected" results anymore because I didn't think these next test
> errors should have been affected, right?
>
>  -- error: system attributes "ctid" not allowed in column list
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
> -ERROR:  cannot use system column "ctid" in publication column list
> +ERROR:  relation "testpub_tbl5" is already member of publication
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> ~
>
>  -- error: duplicates not allowed in column list
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
> -ERROR:  duplicate column "a" in publication column list
> +ERROR:  relation "testpub_tbl5" is already member of publication
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> probably more like this...
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 6.
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE test_gen (a int PRIMARY KEY, b int);
> +));
> +
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE 

Re: Assertion failure when autovacuum drops orphan temp indexes.

2024-10-28 Thread Nathan Bossart
On Sun, Oct 27, 2024 at 09:36:38PM -0700, Masahiko Sawada wrote:
> Thank you for your suggestion. IMHO I'm not sure we need to have a
> regression test for this bug as it's just an oversight of recently
> committed code.

Agreed.  FWIW cfbot seems to catch this already (CTRL+F for "index_drop" in
the highlights page [0]), and I saw it on my laptop shortly after the issue
was reported.

[0] http://cfbot.cputube.org/highlights/all.html

-- 
nathan




Re: Alias of VALUES RTE in explain plan

2024-10-28 Thread Tom Lane
Ashutosh Bapat  writes:
> The patch looks good to me, except the name of the new member.

>   CommonTableExpr *p_parent_cte; /* this query's containing CTE */
> + Alias*p_parent_alias; /* parent's alias for this query */

> the two "parent"s here mean different things and that might lead one
> to assume that the p_parent_alias refers to alias of CTE. The comment
> adds to the confusion since it mentions parent. How about renaming it
> as p_outer_alias? or something which indicates alias of the outer
> query?

Hmm, I figured the two "parent" references do mean the same thing,
ie the immediately surrounding syntactic construct.  While I won't
fight hard about it, I don't see an advantage in naming the new
field differently.  We could make the comment be

/* outer level's alias for this query */

if that helps any.

regards, tom lane




Proper object locking for GRANT/REVOKE

2024-10-28 Thread Peter Eisentraut
This patch started out as a refactoring, thinking that 
objectNamesToOids() in aclchk.c should really mostly be a loop around 
get_object_address().  This is mostly true, with a few special cases 
because the node representations are a bit different in some cases, and 
OBJECT_PARAMETER_ACL, which is obviously very different.  This saves a 
bunch of duplicative code, which is nice.


Additionally, get_object_address() handles locking, which 
objectNamesToOids() somewhat famously does not do, and there is a code 
comment about it.  With this refactoring, we get the locking pretty much 
for free.


Interestingly, this changes the output of the intra-grant-inplace 
isolation test, which is recent work.  It might be good to get some 
review from those who worked on that, to make sure that the new behavior 
is still correct and/or to check whether those test cases are still 
applicable.


Also, it would be worth thinking about what the correct lock mode should 
be here.
From fb910c6af0fb0691448680ec4a4cc1eb1857cd5b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 28 Oct 2024 16:07:01 +0100
Subject: [PATCH] Proper object locking for GRANT/REVOKE

Refactor objectNamesToOids() to use get_object_address() internally if
possible.  Not only does this save a lot of code, it also allows us to
use the object locking provided by get_object_address() for
GRANT/REVOKE.  There was previously a code comment that complained
about the lack of locking in objectNamesToOids(), which is now fixed.

The check in ExecGrant_Type_check() is obsolete because
get_object_address_type() already does the same check.
---
 src/backend/catalog/aclchk.c  | 162 +-
 .../expected/intra-grant-inplace.out  |   2 +-
 2 files changed, 41 insertions(+), 123 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 95eb0b12277..2f91aa865a3 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -659,147 +659,76 @@ ExecGrantStmt_oids(InternalGrant *istmt)
  * objectNamesToOids
  *
  * Turn a list of object names of a given type into an Oid list.
- *
- * XXX: This function doesn't take any sort of locks on the objects whose
- * names it looks up.  In the face of concurrent DDL, we might easily latch
- * onto an old version of an object, causing the GRANT or REVOKE statement
- * to fail.
  */
 static List *
 objectNamesToOids(ObjectType objtype, List *objnames, bool is_grant)
 {
List   *objects = NIL;
ListCell   *cell;
+   const LOCKMODE lockmode = AccessShareLock;
 
Assert(objnames != NIL);
 
switch (objtype)
{
+   default:
+
+   /*
+* For most object types, we use get_object_address() 
directly.
+*/
+   foreach(cell, objnames)
+   {
+   ObjectAddress address;
+   Relationrelation;
+
+   address = get_object_address(objtype, 
lfirst(cell), &relation, lockmode, false);
+   Assert(relation == NULL);
+   objects = lappend_oid(objects, 
address.objectId);
+   }
+   break;
case OBJECT_TABLE:
case OBJECT_SEQUENCE:
+
+   /*
+* Here, we don't use get_object_address().  It 
requires that the
+* specified object type match the actual type of the 
object, but
+* in GRANT/REVOKE, all table-like things are addressed 
as TABLE.
+*/
foreach(cell, objnames)
{
RangeVar   *relvar = (RangeVar *) lfirst(cell);
Oid relOid;
 
-   relOid = RangeVarGetRelid(relvar, NoLock, 
false);
+   relOid = RangeVarGetRelid(relvar, lockmode, 
false);
objects = lappend_oid(objects, relOid);
}
break;
-   case OBJECT_DATABASE:
-   foreach(cell, objnames)
-   {
-   char   *dbname = strVal(lfirst(cell));
-   Oid dbid;
-
-   dbid = get_database_oid(dbname, false);
-   objects = lappend_oid(objects, dbid);
-   }
-   break;
case OBJECT_DOMAIN:
case OBJECT_TYPE:
+
+   /*
+* The parse representation of types and domains in 
privilege
+* targets is different from that expected by 
get_object_address()
+ 

Reduce one comparison in binaryheap's sift down

2024-10-28 Thread cca5507
Hi,

I think we can reduce one comparison in binaryheap's sift down, right?


Here is a patch to fix it.


--
Regards,
ChangAo Chen

v1-0001-Reduce-one-comparison-in-binaryheap-s-sift-down.patch
Description: Binary data


Re: Avoid possible overflow (src/port/bsearch_arg.c)

2024-10-28 Thread Tomas Vondra



On 10/28/24 13:13, Heikki Linnakangas wrote:
> On 09/10/2024 19:16, Ranier Vilela wrote:
>> Em ter., 8 de out. de 2024 às 18:28, Nathan Bossart
>> mailto:nathandboss...@gmail.com>> escreveu:
>>
>>     On Tue, Oct 08, 2024 at 04:09:00PM -0300, Ranier Vilela wrote:
>>  > The port function *bsearch_arg* mimics the C function
>>  > *bsearch*.
>>  >
>>  > The API signature is:
>>  > void *
>>  > bsearch_arg(const void *key, const void *base0,
>>  > size_t nmemb, size_t size,
>>  > int (*compar) (const void *, const void *, void *),
>>  > void *arg)
>>  >
>>  > So, the parameter *nmemb* is size_t.
>>  > Therefore, a call with nmemb greater than INT_MAX is possible.
>>  >
>>  > Internally the code uses the *int* type to iterate through the
>>     number of
>>  > members, which makes overflow possible.
>>
>>     I traced this back to commit bfa2cee (v14), which both moved
>>     bsearch_arg()
>>     to its current location and adjusted the style a bit.  Your patch
>> looks
>>     reasonable to me.
>>
>> Thanks for looking.
> 
> Committed, thanks.
> 
> Based on the original discussion on bfa2cee, I couldn't figure out where
> exactly this new bsearch implementation originated from, but googling
> around, probably *BSD or libiberty. Tomas, do you remember? Not that it
> matters, but I'm curious.
> 

I don't remember, unfortunately :-( I think it was one of the *BSDs,
because of license, but I'm not quite sure why I changed the code at all
during the move.

> Some of those other implementations have fixed this, others have not.
> And they all seem to also have the "involes" typo in the comment that we
> fixed in commit 7ef8b52cf07 :-). Ranier, you might want to submit this
> fix to those other projects too.
> 

Thanks for fixing this, although I wonder if we can actually hit this,
as we don't really allocate more than 1GB in most places. But it's
possible, and the pre-bfa2cee code handled it fine.


regards

-- 
Tomas Vondra





Re: Fix C23 compiler warning

2024-10-28 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> So maybe we should revive that idea, though I'd definitely target
>> autoconf 2.72 not 2.71.

> Just a data point: autoconf 2.72 came out under a year ago, so the most
> recent Debian Stable (12) and Ubuntu LTS (24.04) only have 2.71.

I don't think we care, except to the extent that usage of 2.72 in
widely-used distros would increase confidence in it (which is far
from a trivial consideration).  For many years, we've had a policy
that committers should use autoconf-built-from-GNU-sources rather
than distro packages.  The distros tend to stick in local changes
that affect the output, but we need uniform output so that there's
not random churn in the committed version of the configure script.

Still, we're in wait-and-see mode about C23, so maybe wait-and-see
for awhile longer about autoconf 2.72 as well.

> They both have m4 1.4.19, though.

That's good news anyway.  Per the older thread, building m4 from
source is no fun at all.

regards, tom lane




Re: define pg_structiszero(addr, s, r)

2024-10-28 Thread Heikki Linnakangas

On 18/09/2024 21:57, Bertrand Drouvot wrote:

On Wed, Sep 18, 2024 at 10:03:21AM +0200, Peter Eisentraut wrote:

On 18.09.24 06:16, Bertrand Drouvot wrote:

+#define pg_structiszero(addr, s, r)
\
+   do {
\
+   /* We assume this initializes to zeroes */  
\
+   static const s all_zeroes;  
\
+   r = (memcmp(addr, &all_zeroes, sizeof(all_zeroes)) == 0);   \
+   } while (0)


Not new with this patch, but do we guarantee padding bytes to be zeros?

How about this instead:

static inline bool
pg_is_all_zeros(const char *p, size_t len)
{
  for (size_t i = 0; i < len; i++)
  {
if (p[i] != 0)
return false;
  }
  return true;
}

Is there's a de facto standard name for that function? I was surprised 
that I couldn't find one with a quick google search. That seems like the 
kind of small utility function that every C program needs.


How performance sensitive is this? If it's not, then the above seems 
like the most straightforward way to do this, which is good. If it is 
performance sensitive, it's still good, because the compiler can 
optimize that well: https://godbolt.org/z/x9hPWjheq.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: In-placre persistance change of a relation

2024-10-28 Thread Heikki Linnakangas

On 31/08/2024 19:09, Kyotaro Horiguchi wrote:

Subject: [PATCH v34 03/16] Remove function for retaining files on outer
 transaction aborts

The function RelationPreserveStorage() was initially created to keep
storage files committed in a subtransaction on the abort of outer
transactions. It was introduced by commit b9b8831ad6 in 2010, but no
use case for this behavior has emerged since then. If we move the
at-commit removal feature of storage files from pendingDeletes to the
UNDO log system, the UNDO system would need to accept the cancellation
of already logged entries, which makes the system overly complex with
no benefit. Therefore, remove the feature.


I don't think that's quite right. I don't think this was meant for 
subtransaction aborts, but to make sure that if the top-transaction 
aborts after AtEOXact_RelationMap() has already been called, we don't 
remove the new relation. AtEOXact_RelationMap() is called very late in 
the commit process to keep the window as small as possible, but if it 
nevertheless happens, the consequences are pretty bad if you remove a 
relation file that is in fact needed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Avoid possible overflow (src/port/bsearch_arg.c)

2024-10-28 Thread Heikki Linnakangas

On 09/10/2024 19:16, Ranier Vilela wrote:
Em ter., 8 de out. de 2024 às 18:28, Nathan Bossart 
mailto:nathandboss...@gmail.com>> escreveu:


On Tue, Oct 08, 2024 at 04:09:00PM -0300, Ranier Vilela wrote:
 > The port function *bsearch_arg* mimics the C function
 > *bsearch*.
 >
 > The API signature is:
 > void *
 > bsearch_arg(const void *key, const void *base0,
 > size_t nmemb, size_t size,
 > int (*compar) (const void *, const void *, void *),
 > void *arg)
 >
 > So, the parameter *nmemb* is size_t.
 > Therefore, a call with nmemb greater than INT_MAX is possible.
 >
 > Internally the code uses the *int* type to iterate through the
number of
 > members, which makes overflow possible.

I traced this back to commit bfa2cee (v14), which both moved
bsearch_arg()
to its current location and adjusted the style a bit.  Your patch looks
reasonable to me.

Thanks for looking.


Committed, thanks.

Based on the original discussion on bfa2cee, I couldn't figure out where 
exactly this new bsearch implementation originated from, but googling 
around, probably *BSD or libiberty. Tomas, do you remember? Not that it 
matters, but I'm curious.


Some of those other implementations have fixed this, others have not. 
And they all seem to also have the "involes" typo in the comment that we 
fixed in commit 7ef8b52cf07 :-). Ranier, you might want to submit this 
fix to those other projects too.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-10-28 Thread Umar Hayat
Hi Bruce,
Patch looks good. In my opinion patch should be committed as more and
more people who are trying to build PG 17 on Windows are facing
similar issues. Latest issue was reported yesterday [0].

Regards
Umar Hayat

[0] 
https://www.postgresql.org/message-id/flat/CA%2BOCxozXN%3DGFPWU8vBjG6ch%3Di2q46FrQbTMjpsERGaSXQj-%2BtQ%40mail.gmail.com#8c5cefdb5881f4ed49e254d4ef9dbe09

Umar Hayat

On Thu, 17 Oct 2024 at 07:35, Bruce Momjian  wrote:
>
> On Tue, Jul 16, 2024 at 08:23:11AM -0400, Andrew Dunstan wrote:
> >
> > On 2024-07-16 Tu 7:46 AM, Yasir wrote:
> >
> > Hi Hackers,
> >
> > Recently, I compiled PG17 on the windows. Till PG16 "ActiveState Perl", 
> > as
> > instructed in the documentation, was being used successfully on the 
> > Windows
> > 10/11 to compile PG.
> > However, it looks like that "ActiveState Perl" is not valid anymore to
> > compile PG17 on Windows 10/11 but documentation still suggests it.
> > Therefore, I think documentation needs to be updated.
> > Moreover, I had to install "strawberry's perl" in order to compile PG17 
> > on
> > Windows 10/11. Please check out the thread "errors building on windows
> > using meson" highlighting the issue.
> >
> >
> >
> > See  https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net
> >
> > I agree we should fix the docco.
>
> I have written the attached patch to make these changes.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   When a patient asks the doctor, "Am I going to die?", he means
>   "Am I going to die soon?"




  1   2   >