Re: v17 vs v16 performance comparison

2024-08-01 Thread Alexander Lakhin

Hello Thomas.

01.08.2024 08:57, Thomas Munro wrote:

On Thu, Aug 1, 2024 at 3:00 PM Alexander Lakhin  wrote:

So it looks like q15 from TPC-DS is not the only query suffering from that
change.

I'm going to try to set up a local repro to study these new cases.  If
you have a write-up somewhere of how exactly you run that, that'd be
useful.


I'm using this instrumentation (on my Ubuntu 22.04 workstation):
https://github.com/alexanderlaw/pg-mark.git
README.md can probably serve as a such write-up.

If you install all the prerequisites (some tests, including pg_tpcds,
require downloading additional resources; run-benchmarks.py will ask to
do that), there should be no problems with running benchmarks.

I just added two instances to config.xml:
    
    
and ran
1)
./prepare-instances.py -i pg-src-16 pg-src-17

2)
time ./run-benchmarks.py -i pg-src-16 pg-src-17 pg-src-16 pg-src-17 pg-src-17 
pg-src-16
(it took 1045m55,215s on my machine so you may prefer to choose the single
benchmark (-b pg_tpcds or maybe s64da_tpcds))

3)
./analyze-benchmarks.py -i 'pg-src-17--.*' 'pg-src-16--.*'

All the upper-level commands to run benchmarks are contained in config.xml,
so you can just execute them separately, but my instrumentation eases
processing of the results by creating one unified benchmark-results.xml.

Please feel free to ask any questions or give your feedback.

Thank you for paying attention to this!

Best regards,
Alexander




Re: proposal: schema variables

2024-08-01 Thread Erik Rijkers

doc-build fails: a slash tumbled backwards in doc/src/sgml:

$ grep 'xref.*.\\>'  *.sgml
plpgsql.sgml: (see ).

That should simply be a forward slash, of course.

Thanks,

Erik Rijkers




Op 8/1/24 om 08:12 schreef Pavel Stehule:

Hi

fresh rebase + fix format in doc

Regards

Pavel






Re: proposal: schema variables

2024-08-01 Thread Pavel Stehule
čt 1. 8. 2024 v 9:45 odesílatel Erik Rijkers  napsal:

> doc-build fails: a slash tumbled backwards in doc/src/sgml:
>
> $ grep 'xref.*.\\>'  *.sgml
> plpgsql.sgml: (see ).
>
> That should simply be a forward slash, of course.
>

should be fixed in my today patchset



>
> Thanks,
>
> Erik Rijkers
>
>
>
>
> Op 8/1/24 om 08:12 schreef Pavel Stehule:
> > Hi
> >
> > fresh rebase + fix format in doc
> >
> > Regards
> >
> > Pavel
> >
>


Re: Pgoutput not capturing the generated columns

2024-08-01 Thread Peter Smith
Hi, Here are my review comments for patch v22-0001

All comments now are only for the TAP test.

==
src/test/subscription/t/011_generated.pl

1. I added all new code for the missing combination test case
"gen-to-missing". See nitpicks diff.
- create a separate publication for this "tab_gen_to_missing" table
because the test gives subscription errors.
- for the initial data
- for the replicated data

~~~

2. I added sub1 and sub2 subscriptions for every combo test
(previously some were absent). See nitpicks diff.

~~~

3. There was a missing test case for nogen-to-gen combination, and
after experimenting with this I am getting a bit suspicious,

Currently, it seems that if a COPY is attempted then the error would
be like this:
2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
be used in COPY.

OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
0001 allows replication to happen. And the generated value of the
subscriber "b" takes precedence.

I have included these tests in the nitpicks diff of patch 0001.

Those results weren't exactly what I was expecting.  That is why it is
so important to include *every* test combination in these TAP tests --
because unless we know how it works today, we won't know if we are
accidentally breaking the current behaviour with the other (0002,
0003) patches.

Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
to make sure the (new?) patch errors make sense and don't overstep by
giving ERRORs when they should not.



Also, many other smaller issues/changes were done:

~~~

Creating tables:

nitpick - rearranged to keep all combo test SQLs in a consistent order
throughout this file
1/ gen-to-gen
2/ gen-to-nogen
3/ gen-to-missing
4/ missing-to-gen
5/ nogen-to-gen

nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.

nitpick - tweaked some CREATE TABLE comments for consistency.

nitpick - in the v22 patch many of the generated col 'b' use different
computations for every test. It makes it unnecessarily difficult to
read/review the expected results. So, I've made them all the same. Now
computation is "a * 2" on the publisher side, and "a * 22" on the
subscriber side.

~~~

Creating Publications and Subscriptions:


nitpick - added comment for all the CREATE PUBLICATION

nitpick - added comment for all the CREATE SUBSCRIPTION

nitpick - I moved the note about copy_data = false to where all the
node_subscriber2 subscriptions are created. Also, don't explicitly
refer to "patch 000" in the comment, because that will not make any
sense after getting pushed.

nitpick - I changed many subscriber names to consistently use "sub1"
or "sub2" within the name (this is the visual cue of which
node_subscriber they are on). e.g.
/regress_sub_combo2/regress_sub2_combo/

~~~

Initial Sync tests:

nitpick - not sure if it is possible to do the initial data tests for
"nogen_to_gen" in the normal place. For now, it is just replaced by a
comment.
NOTE - Maybe this should be refactored later to put all the initial
data checks in one place. I'll think about this point more in the next
review.

~~~

nitpick - Changed cleanup I drop subscriptions before publications.

nitpick - remove the unnecessary blank line at the end.

==

Please see the attached diffs patch (apply it atop patch 0001) which
includes all the nipick changes mentioned above.

~~

BTW, For a quicker turnaround and less churning please consider just
posting the v23-0001 by itself instead of waiting to rebase all the
subsequent patches. When 0001 settles down some more then rebase the
others.

~~

Also, please run the indentation tool over this code ASAP.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 05b83f6..504714a 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -34,59 +34,60 @@ $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
22) STORED, c int)"
 );
 
+# tab_gen_to_gen:
 # publisher-side has generated col 'b'.
 # subscriber-side has generated col 'b', with different computation.
 $node_publisher->safe_psql('postgres',
-   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 10) 
STORED)");
+   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a * 2) 
STORED)");
 $node_subscriber->safe_psql('postgres',
-   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 20) 
STORED)");
+   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a * 22) 
STORED)");
 $node_subscriber2->safe_psql('postgres',
-   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 20) 
STORED)");
+   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a * 22) 
STORED)");
 

Query results vary depending on the plan cache mode used

2024-08-01 Thread Richard Guo
While working on the grouping sets patches for queries with GROUP BY
items that are constants, I noticed $subject on master.  As an
example, consider

prepare q1(int) as
select $1 as c1, $1 as c2 from generate_series(1,2) t group by rollup(c1);

set plan_cache_mode to force_custom_plan;
execute q1(3);
 c1 | c2
+
  3 |  3
|  3
(2 rows)

set plan_cache_mode to force_generic_plan;
execute q1(3);
 c1 | c2
+
  3 |  3
|
(2 rows)

The reason can be seen in the plans under different modes.

-- force_custom_plan
explain (verbose, costs off) execute q1(3);
 QUERY PLAN
-
 GroupAggregate
   Output: (3), 3
   Group Key: 3
   Group Key: ()
   ->  Function Scan on pg_catalog.generate_series t
 Output: 3
 Function Call: generate_series(1, 2)
(7 rows)

-- force_generic_plan
explain (verbose, costs off) execute q1(3);
 QUERY PLAN
-
 GroupAggregate
   Output: ($1), ($1)
   Group Key: $1
   Group Key: ()
   ->  Function Scan on pg_catalog.generate_series t
 Output: $1
 Function Call: generate_series(1, 2)
(7 rows)

In custom mode, the target entry 'c2' is a Const expression, and
setrefs.c does not replace it with an OUTER_VAR, despite there happens
to be an identical Const below.  As a result, when this OUTER_VAR goes
to NULL due to the grouping sets, 'c2' remains as constant 3.  Look at
this code in search_indexed_tlist_for_non_var:

/*
 * If it's a simple Const, replacing it with a Var is silly, even if there
 * happens to be an identical Const below; a Var is more expensive to
 * execute than a Const.  What's more, replacing it could confuse some
 * places in the executor that expect to see simple Consts for, eg,
 * dropped columns.
 */
if (IsA(node, Const))
return NULL;

In generic mode, the target entry 'c2' is a Param expression, and is
replaced with the OUTER_VAR (indicated by the parentheses around the
second '$1').  So it goes to NULL when we're grouping by the set that
does not contain this Var.

Is this inconsistent behavior in different plan cache modes expected,
or does it indicate a bug that needs to be fixed?

Thanks
Richard




Re: Remove duplicate table scan in logical apply worker and code refactoring

2024-08-01 Thread Amit Kapila
On Thu, Aug 1, 2024 at 7:29 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > Thanks for reviewing the patch, and your understanding is correct.
> >
> > Here is the updated patch 0001. I removed the comments as suggested by Amit.
> >
> > Since 0002 patch is only refactoring the code and I need some time to review
> > the comments for it, I will hold it until the 0001 is committed.
>
> Thanks for updating the patch. I did a performance testing with v2-0001.
>
> Before: 15.553 [s]
> After:  7.593 [s]
>

Thanks for the testing. I have pushed the patch.

-- 
With Regards,
Amit Kapila.




RE: Conflict detection and logging in logical replication

2024-08-01 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

Let me contribute the great feature. I read only the 0001 patch and here are 
initial comments.

01. logical-replication.sgml

track_commit_timestamp must be specified only on the subscriber, but it is not 
clarified.
Can you write down that?

02. logical-replication.sgml

I felt that the ordering of {exists, differ,missing} should be fixed, but not 
done.
For update "differ" is listerd after the "missing", but for delete, "differ"
locates before the "missing". The inconsistency exists on souce code as well.

03. conflict.h

The copyright seems wrong. 2012 is not needed.

04. general

According to the documentation [1], there is another constraint "exclude", which
can cause another type of conflict. But this pattern cannot be logged in detail.
I tested below workload as an example.

=
publisher=# create table tab (a int, EXCLUDE (a WITH =));
publisher=# create publication pub for all tables;

subscriber=# create table tab (a int, EXCLUDE (a WITH =));
subscriber=# create subscription sub...;
subscriber=# insert into tab values (1);

publisher=# insert into tab values (1);

-> Got conflict with below log lines:
```
ERROR:  conflicting key value violates exclusion constraint "tab_a_excl"
DETAIL:  Key (a)=(1) conflicts with existing key (a)=(1).
CONTEXT:  processing remote data for replication origin "pg_16389" during 
message type "INSERT"
for replication target relation "public.tab" in transaction 740, finished at 
0/1543940
```
=

Can we support the type of conflict?

[1]: 
https://www.postgresql.org/docs/devel/sql-createtable.html#SQL-CREATETABLE-EXCLUDE

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Improving tracking/processing of buildfarm test failures

2024-08-01 Thread Alexander Lakhin

02.07.2024 15:00, Alexander Lakhin wrote:


One month later,  I'd like to summarize failures that I've investigated
and classified during June, 2024 on the aforementioned wiki page.
(Maybe it would make sense to issue a monthly report with such information
in the future.)


Please take a look at July report on the buildfarm failures:
# SELECT br, count(*) FROM failures WHERE dt >= '2024-07-01' AND
 dt < '2024-08-01' GROUP BY br;

REL_12_STABLE: 11
REL_13_STABLE: 9
REL_14_STABLE: 7
REL_15_STABLE: 10
REL_16_STABLE: 9
REL_17_STABLE: 68
HEAD: 106
-- Total: 220
(Counting test failures only, excluding indent-check, Configure, Build
errors.)

# SELECT COUNT(*) FROM (SELECT DISTINCT issue_link FROM failures WHERE
 dt >= '2024-07-01' AND dt < '2024-08-01');
40

# SELECT issue_link, count(*) FROM failures WHERE dt >= '2024-07-01' AND
 dt < '2024-08-01' GROUP BY issue_link ORDER BY 2 DESC LIMIT 9;

https://www.postgresql.org/message-id/20240404170055.qynecay7szu3d...@awork3.anarazel.de:
 29
-- An environmental issue

https://www.postgresql.org/message-id/a9a97e83-9ec8-5de5-bf69-80e9560f5...@gmail.com:
 20
-- Probably fixed

https://www.postgresql.org/message-id/1545399.1720554...@sss.pgh.pa.us: 11
-- Fixed

https://www.postgresql.org/message-id/4db099c8-4a52-3cc4-e970-14539a319...@gmail.com:
 9

https://www.postgresql.org/message-id/db093cce-7eec-8516-ef0f-891895178...@gmail.com:
 8
-- An environmental issue; probably fixed

https://www.postgresql.org/message-id/b2037a8d-fe6b-d299-da17-ff5f3214e...@gmail.com:
 8

https://www.postgresql.org/message-id/3e2cbd24-f45e-4b2b-ba83-8149214f0...@dunslane.net:
 8
-- Fixed

https://www.postgresql.org/message-id/68de6498-0449-a113-dd03-e198dded0...@gmail.com:
 8
-- Fixed

https://www.postgresql.org/message-id/3618203.1722473...@sss.pgh.pa.us: 8
-- Fixed

# SELECT count(*) FROM failures WHERE dt >= '2024-07-01' AND
 dt < '2024-08-01' AND issue_link IS NULL; -- Unsorted/unhelpful failures
17

And one more metric, that might be useful, but it requires also time
analysis — short-lived (eliminated immediately) failures: 83

I also wrote a simple script (see attached) to check for unknown buildfarm
failures using "HTML API", to make sure no failures missed. Surely, it
could be improved in many ways, but I find it rather useful as-is.

Best regards,
Alexander#!/bin/bash

TMP=${TMPDIR:-/tmp}
wget "https://buildfarm.postgresql.org/cgi-bin/show_failures.pl"; -O 
"$TMP/failures.html"
wget "https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures"; -O 
"$TMP/known-test-failures.html"
sed -E 's/\&max_days/\&max_days/; s/(hours|mins|secs| i) < /\1 \< /' -i 
"$TMP/failures.html"
cat << 'EOF' > "$TMP/t.xsl"

http://www.w3.org/1999/XSL/Transform";
  xmlns="http://www.w3.org/1999/xhtml";
  xmlns:xhtml="http://www.w3.org/1999/xhtml";>












EOF

for fl in $(xsltproc "$TMP/t.xsl" "$TMP/failures.html"); do
if [[ $fl == show_log* ]]; then
sfl=${fl/\&/\&}
grep -q "$sfl" "$TMP/known-test-failures.html" && continue
echo "An unknown failure found: 
https://buildfarm.postgresql.org/cgi-bin/$fl";
wget "https://buildfarm.postgresql.org/cgi-bin/$fl"; -O 
"$TMP/failure-$fl.log"

il=""
if grep -q -Pzo \
'(?s)pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/log/regress_log_040_pg_createsubscriber\b.*'\
'ok 29 - run pg_createsubscriber without --databases\s*\n.*'\
'pg_createsubscriber: error: recovery timed out\s*\n.*'\
'not ok 30 - run pg_createsubscriber on node S\s*\n'\
"$TMP/failure-$fl.log"; then

il="https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#040_pg_createsubscriber.pl_fails_due_to_recovery_timed_out_during_pg_createsubscriber_run";

elif grep -q -Pzo 
'(?s)(pgsql.build/testrun/postgres_fdw-running/regress|pgsql.build/testrun/postgres_fdw/regress|pgsql.build/contrib/postgres_fdw)/regression.diffs<.*'\
' ERROR:  canceling statement due to statement timeout\s*\n'\
'\+WARNING:  could not get result of cancel request due to timeout\s*\n'\
"$TMP/failure-$fl.log"; then

il="https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#The_postgres_fdw_test_fails_due_to_an_unexpected_warning_on_canceling_a_statement";

elif grep -q -Pzo \
'(?s)# poll_query_until timed out executing this query:\s*\n'\
'# \s*\n'\
'# SELECT NOT EXISTS \(\s*\n'\
'#  SELECT \*\s*\n'\
'#  FROM pg_database\s*\n'\
"#\s*WHERE age\(datfrozenxid\) \> 
current_setting\('autovacuum_freeze_max_age'\)::int\)\s*\n.*"\
'# Looks like your test exited with 29 just after 1.\s*\n'\
't/001_emergency_vacuum.pl ..\s*\n'\
"$TMP/failure-$fl.log"; then

il="https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#001_emergency_vacuum.pl_fails_to_wait_for_datfrozenxid_advancing";

elif grep -q -Pzo \
'(?s)Details for system "[^"]+" failure at stage pg_amcheckCheck,.*'\
'postgresql:pg_amcheck / pg_amcheck/005_opclass_damage\s

Re: Comment in portal.h

2024-08-01 Thread Etsuro Fujita
Hi,

On Wed, Jul 31, 2024 at 8:55 AM Andy Fan  wrote:
> Etsuro Fujita  writes:
> > I noticed $SUBJECT while working on something else:
> >
> > /*
> >  * Where we store tuples for a held cursor or a PORTAL_ONE_RETURNING or
> >  * PORTAL_UTIL_SELECT query.  (A cursor held past the end of its
> >  * transaction no longer has any active executor state.)
> >  */
> > Tuplestorestate *holdStore; /* store for holdable cursors */
> > MemoryContext holdContext;  /* memory containing holdStore */
> >
> > We do that for PORTAL_ONE_MOD_WITH as well, so the comment should be
> > "Where we store tuples for a held cursor or a PORTAL_ONE_RETURNING,
> > PORTAL_ONE_MOD_WITH, or PORTAL_UTIL_SELECT query.".  Attached is a
> > patch for that.
>
> Patch looks good to me.
>
> All the codes of PortalRun & FillPortalStore & PortalRunSelect are
> consistent with this idea.

Pushed.  Thanks for looking!

Best regards,
Etsuro Fujita




Re: [BUG?] check_exclusion_or_unique_constraint false negative

2024-08-01 Thread Michail Nikolaev
Hello, Hayato!

> Thanks for pointing out the issue!

Thanks for your attention!

> IIUC, the issue can happen when two concurrent transactions using
DirtySnapshot access
> the same tuples, which is not specific to the parallel apply

Not exactly, it happens for any DirtySnapshot scan over a B-tree index with
some other transaction updating the same index page (even using the MVCC
snapshot).

So, logical replication related scenario looks like this:

* subscriber worker receives a tuple update\delete from the publisher
* it calls RelationFindReplTupleByIndex to find the tuple in the local table
* some other transaction updates the tuple in the local table (on
subscriber side) in parallel
* RelationFindReplTupleByIndex may not find the tuple because it uses
DirtySnapshot
* update\delete is lost

Parallel apply mode looks like more dangerous because it uses multiple
workers on the subscriber side, so the probability of the issue is higher.
In that case, "some other transaction" is just another worker applying
changes of different transaction in parallel.

Best regards,
Mikhail.


Re: rare crash - FailedAssertion snapbuild.c Line: 580

2024-08-01 Thread Pradeep Kumar
Hello all,

Any update on this?

Thanks and regards
Pradeep

On Thu, Jul 25, 2024 at 1:09 PM Pradeep Kumar 
wrote:

> Hello all,
>
> Message Id:
> https://www.postgresql.org/message-id/flat/3808dc548e144c860fc3fe57de013809%40xs4all.nl#72629a188e99e14387230fccc8eef518
>
> Actually I'm also facing this issue. Any solution regarding this? .Kindly
> give me the explanation of why this error occurs.
>
> Thanks and regards
> Pradeep
>


Re: make pg_createsubscriber option names more consistent

2024-08-01 Thread Peter Eisentraut

On 31.07.24 11:15, Hayato Kuroda (Fujitsu) wrote:

Dear Peter,


I propose to rename the pg_createsubscriber option --socket-directory to
--socketdir.  This would make it match the equivalent option in
pg_upgrade.  (It even has the same short option '-s'.)
pg_createsubscriber and pg_upgrade have a lot of common terminology and
a similar operating mode, so it would make sense to keep this consistent.


+1. If so, should we say "default current dir." instead of "default current 
directory" in usage()
because pg_upgrade says like that?


Committed with that change.  Thanks.





Re: Add LSN <-> time conversion functionality

2024-08-01 Thread Andrey M. Borodin
This is a copy of my message for pgsql-hackers mailing list. Unfortunately 
original message was rejected due to one of recipients addresses is blocked.

> On 1 Aug 2024, at 10:54, Andrey M. Borodin  wrote:
> 
> 
> 
>> On 1 Aug 2024, at 05:44, Melanie Plageman  wrote:
>> 
>> Thanks for the review! v6 attached.
>> 
>> On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin  
>> wrote:
>>> 
>>> PgStartLSN = GetXLogInsertRecPtr();
>>> Should this be kind of RecoveryEndPtr? How is it expected to behave on 
>>> Standby in HA cluster, which was doing a crash recovery of 1y WALs in a 
>>> day, then is in startup for a year as a Hot Standby, and then is promoted?
>> 
>> So, I don't think we will allow use of the LSNTimeStream on a standby,
>> since it is unclear what it would mean on a standby. For example, do
>> you want to know the time the LSN was generated or the time it was
>> replayed? Note that bgwriter won't insert values to the time stream on
>> a standby (it explicitly checks).
> 
> Yes, I mentioned Standby because PgStartLSN is not what it says it is.
> 
>> 
>> But, you bring up an issue that I don't quite know what to do about.
>> If the standby doesn't have an LSNTimeStream, then when it is
>> promoted, LSN <-> time conversions of LSNs and times before the
>> promotion seem impossible. Maybe if the stats file is getting written
>> out at checkpoints, we could restore from that previous primary's file
>> after promotion?
> 
> I’m afraid that clocks of a Primary from previous timeline might be not in 
> sync with ours.
> It’s OK if it causes error, we just need to be prepared when they indicate 
> values from future. Perhaps, by shifting their last point to our “PgStartLSN”.
> 
>> 
>> This brings up a second issue, which is that, currently, bgwriter
>> won't insert into the time stream when wal_level is minimal. I'm not
>> sure exactly how to resolve it because I am relying on the "last
>> important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle
>> when the bgwriter actually inserts new records into the LSNTimeStream.
>> I could come up with a different timeout interval for updating the
>> time stream. Perhaps I should do that?
> 
> IDK. My knowledge of bgwriter is not enough to give a meaningful advise here.
> 
>> 
>>> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not 
>>> seem capable to accommodate LSN*time. And the function may return negative 
>>> result, despite claiming area as a result. It’s intended, but a little 
>>> misleading.
>> 
>> Ah, great point. I've fixed this.
> 
> Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed 
> upon usage. I’d recommend to fabs in function.
> BTW lsn_ts_calculate_error_area() have no prototype.
> 
> Also, I’m not a big fan of using IEEE 754 float in this function. This data 
> type have 24 bits of significand bits.
> Consider that current timestamp has 50 binary digits. Let’s assume realistic 
> LSNs have same 50 bits.
> Then our rounding error is 2^76 byte*microseconds.
> Let’s assume we are interested to measure time on a scale of 1GB WAL records.
> This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million 
> seconds = 2 years.
> Seems like a gross error.
> 
> If we use IEEE 754 doubles we have 53 significand bytes. And rounding error 
> will be on a scale of 128 microseconds per GB, which is acceptable.
> 
> So I think double is better than float here.
> 
> Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) 
> before subtracting. This can save a bit of precision (smaller figures can 
> have lesser exponent).
> 
> 
 On 29 Jun 2024, at 03:09, Melanie Plageman  
 wrote:
 change the user-facing functions for estimating an
 LSN/time conversion to instead return a floor and a ceiling -- instead
 of linearly interpolating a guess. This would be a way to keep users
 from misunderstanding the accuracy of the functions to translate LSN
 <-> time.
>>> 
>>> I think this is a good idea. And it covers well “server restart problem”. 
>>> If API just returns -inf as a boundary, caller can correctly interpret this 
>>> situation.
>> 
>> Providing "ceiling" and "floor" user functions is still a TODO for me,
>> however, I think that the patch mostly does handle server restarts.
>> 
>> In the event of a restart, the cumulative stats system will have
>> persisted our time stream, so the LSNTimeStream will just be read back
>> in with the rest of the stats. I've added logic to ensure that if the
>> PgStartLSN is newer than our oldest LSNTimeStream entry, we use the
>> oldest entry instead of PgStartLSN when doing conversions LSN <->
>> time.
>> 
>> As for a crash, stats do not persist crashes, but I think Michael's
>> patch will go in to write out the stats file at checkpoints, and then
>> this should be good enough.
>> 
>> Is there anything else you think that is an issue with restarts?
> 
> Nope, looks good to me.
> 
>> 
>>> Thanks! Looking forward to 

Re: PG17beta2: SMGR: inconsistent type for nblocks

2024-08-01 Thread Matthias van de Meent
On Tue, 30 Jul 2024 at 14:32, Thomas Munro  wrote:
>
> On Tue, Jul 30, 2024 at 11:24 PM Matthias van de Meent
>  wrote:
> > While working on rebasing the patches of Neon's fork onto the
> > REL_17_STABLE branch, I noticed that the nblocks arguments of various
> > smgr functions have inconsistent types: smgrzeroextend accepts
> > `nblocks` as signed integer, as does the new signature for
> > smgrprefetch, but the new vectorized operations of *readv and *writev,
> > and the older *writeback all use an unsigned BlockNumber as indicator
> > for number of blocks.
> >
> > Can we update the definition to be consistent across this (new, or
> > also older) API? As far as I can see, in none of these cases are
> > negative numbers allowed or expected, so updating this all to be
> > consistently BlockNumber across the API seems like a straigthforward
> > patch.
> >
> > cc-ed Thomas as committer of the PG17 smgr API changes.
>
> Yeah, right, I noticed that once myself[1].  For the cases from my
> keyboard, I guess I was trying to be consistent with nearby existing
> stuff in each case, which was already inconsistent...  Do you have a
> patch?

Here's one that covers both master and the v17 backbranch.

Kind regards,

Matthias van de Meent


0001-Update-SMGR-API-to-use-consistent-types-for-nblocks-.patch
Description: Binary data


Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Laurenz Albe
On Wed, 2024-07-31 at 14:43 -0400, Joe Conway wrote:
> I still maintain that there is a whole host of users that would accept 
> the risk of side channel attacks via existence of an error or not, if 
> they could only be sure nothing sensitive leaks directly into the logs 
> or to the clients. We should give them that choice.

I think that you are right.

But what do you tell the users who would not accept that risk?

Yours,
Laurenz Albe




Re: proposal: schema variables

2024-08-01 Thread Laurenz Albe
On Thu, 2024-08-01 at 08:12 +0200, Pavel Stehule wrote:
> fresh rebase + fix format in doc

Thanks!

I'm curious, but too lazy to build the patch now, so I'm asking:
what did you do about this error?

> CREATE VARIABLE var AS date;
> LET var = current_date;
> PREPARE stmt(date) AS SELECT $1;
> EXECUTE stmt(var);
> ERROR:  paramid of PARAM_VARIABLE param is out of range

Or does a later patch take care of that?

Yours,
Laurenz Albe




Re: proposal: schema variables

2024-08-01 Thread Pavel Stehule
čt 1. 8. 2024 v 13:22 odesílatel Laurenz Albe 
napsal:

> On Thu, 2024-08-01 at 08:12 +0200, Pavel Stehule wrote:
> > fresh rebase + fix format in doc
>
> Thanks!
>
> I'm curious, but too lazy to build the patch now, so I'm asking:
> what did you do about this error?
>

I try to investigate this issue now.

The patchset is just merging of your work


> > CREATE VARIABLE var AS date;
> > LET var = current_date;
> > PREPARE stmt(date) AS SELECT $1;
> > EXECUTE stmt(var);
> > ERROR:  paramid of PARAM_VARIABLE param is out of range
>


>
> Or does a later patch take care of that?
>

This is a clear bug, and I have to fix it. I hope so I'll do this today



>
> Yours,
> Laurenz Albe
>


Memory growth observed with C++ application consuming libpq.dll on Windows

2024-08-01 Thread Rajesh Kokkonda
Hi,

We are seeing a gradual growth in the memory consumption of our process on
Windows. Ours is a C++ application that directly loads libpq.dll and
handles the queries and functions. We use setSingleRowMethod to limit the
number of rows returned simultaneously to the application. We do not
observe any memory increase when the application is run on Linux. There is
no code difference between Windows and Linux from the
application standpoint. We ran valgrind against our application on Linux
and found no memory leaks. Since the same code is being used on Windows as
well, we do not suspect any memory leak there.  The question is if there
are any known memory leaks with the version of the library we are using on
Windows. Kindly let us know.

The version of the library on Linux is libpq.so.5.16

The windows version of the library is 16.0.3.0


[image: image.png]

Thanks,
Rajesh


Re: Conflict detection and logging in logical replication

2024-08-01 Thread Amit Kapila
On Thu, Aug 1, 2024 at 2:26 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> 04. general
>
> According to the documentation [1], there is another constraint "exclude", 
> which
> can cause another type of conflict. But this pattern cannot be logged in 
> detail.
>

As per docs, "exclusion constraints can specify constraints that are
more general than simple equality", so I don't think it satisfies the
kind of conflicts we are trying to LOG and then in the future patch
allows automatic resolution for the same. For example, when we have
last_update_wins strategy, we will replace the rows with remote rows
when the key column values match which shouldn't be true in general
for exclusion constraints. Similarly, we don't want to consider other
constraint violations like CHECK to consider as conflicts. We can
always extend the basic functionality for more conflicts if required
but let's go with reporting straight-forward stuff first.

-- 
With Regards,
Amit Kapila.




Re: Memory growth observed with C++ application consuming libpq.dll on Windows

2024-08-01 Thread Yasir
Hi Rajesh,

Can you please attach a sample code snippet showing libpq's functions being
called? It will help to identify the libpq's functions to investigate
further for a potential mem leak.

Regards...

Yasir Hussain

On Thu, Aug 1, 2024 at 4:30 PM Rajesh Kokkonda 
wrote:

> Hi,
>
> We are seeing a gradual growth in the memory consumption of our process on
> Windows. Ours is a C++ application that directly loads libpq.dll and
> handles the queries and functions. We use setSingleRowMethod to limit the
> number of rows returned simultaneously to the application. We do not
> observe any memory increase when the application is run on Linux. There is
> no code difference between Windows and Linux from the
> application standpoint. We ran valgrind against our application on Linux
> and found no memory leaks. Since the same code is being used on Windows as
> well, we do not suspect any memory leak there.  The question is if there
> are any known memory leaks with the version of the library we are using on
> Windows. Kindly let us know.
>
> The version of the library on Linux is libpq.so.5.16
>
> The windows version of the library is 16.0.3.0
>
>
> [image: image.png]
>
> Thanks,
> Rajesh
>


Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Robert Haas
On Wed, Jul 31, 2024 at 4:42 PM Joe Conway  wrote:
> You are assuming that everyone allows direct logins with the ability to
> create procedures. Plenty don't.

Well, if you can send queries, then you can do the same thing, driven
by client-side logic.

If you can't directly send queries in any form, then I guess things
are different. But I don't really understand what kind of system you
have in mind.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Patch] remove duplicated smgrclose

2024-08-01 Thread Junwang Zhao
Hi Steven,

On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
>
> Hello, hackers,
>
> I think there may be some duplicated codes.
> Function smgrDoPendingDeletes() calls both smgrdounlinkall() and smgrclose().
> But both functions would close SMgrRelation object, it's dupliacted behavior?
>
> So I make this patch. Could someone take a look at it?
>
> Thanks for your help,
> Steven
>
> From Highgo.com
>
>
You change LGTM, but the patch seems not to be applied to HEAD,
I generate the attached v2 using `git format` with some commit message.

-- 
Regards
Junwang Zhao


v2-0001-remove-duplicated-smgrclose.patch
Description: Binary data


Re: Why is citext/regress failing on hamerkop?

2024-08-01 Thread Oleg Tselebrovskiy

Thomas Munro wrote 2024-05-12 06:31:

Hamerkop is already green on the 15 and 16 branches, apparently
because it's using the pre-meson test stuff that I guess just didn't
run the relevant test.  In other words, nobody would notice the
difference anyway, and a master-only fix would be enough to end this
44-day red streak.


Sorry for necroposting, but in our automated testing system we have
found some fails of this test. The most recent one was a couple of
days ago (see attached files) on PostgreSQL 15.7. Also I've reported
this bug some time ago [1], but provided an example only for
PostgreSQL 17. Back then the bug was actually found on 15 or 16
branches (no logs remain from couple of months back), but i wanted
to show that it was reproducible on 17.

I would appreciate if you would backpatch this change to 15 and 16
branches.

[1] 
https://www.postgresql.org/message-id/6885a0b52d06f7e5910d2b6276bbb4e8%40postgrespro.ru


Oleg Tselebrovskiy, Postgres ProThe files belonging to this database system will be owned by user 
"GitLabRunner".
This user must also own the server process.

The database cluster will be initialized with locale "English_United 
States.1252".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory 
C:/gr-builds/TaKFe3FF/2/pgpro-dev/postgrespro/ci_base ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... windows
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Moscow
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option 
-A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

pg_ctl -D 
^"C^:^\gr^-builds^\TaKFe3FF^\2^\pgpro^-dev^\postgrespro^\ci^_base^" -l logfile 
start

2024-07-29 14:41:21.649 MSK [14344] LOG:  starting PostgreSQL 15.7, compiled by 
Visual C++ build 1929, 64-bit
2024-07-29 14:41:21.653 MSK [14344] LOG:  listening on IPv6 address "::1", port 
5432
2024-07-29 14:41:21.653 MSK [14344] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2024-07-29 14:41:21.697 MSK [8052] LOG:  database system was shut down at 
2024-07-29 14:41:17 MSK
2024-07-29 14:41:21.735 MSK [14344] LOG:  database system is ready to accept 
connections
2024-07-29 14:41:22.859 MSK [9348] ERROR:  tablespace location must be an 
absolute path
2024-07-29 14:41:22.859 MSK [9348] STATEMENT:  CREATE TABLESPACE 
regress_tblspace LOCATION 'relative';
diff -w -U3 C:/gr-builds/TaKFe3FF/2/pgpro-dev/postgrespro/contrib/citext/expected/citext_utf8.out C:/gr-builds/TaKFe3FF/2/pgpro-dev/postgrespro/contrib/citext/results/citext_utf8.out
--- C:/gr-builds/TaKFe3FF/2/pgpro-dev/postgrespro/contrib/citext/expected/citext_utf8.out	2024-07-29 13:53:45.259126600 +0300
+++ C:/gr-builds/TaKFe3FF/2/pgpro-dev/postgrespro/contrib/citext/results/citext_utf8.out	2024-07-29 14:43:38.772857200 +0300
@@ -54,7 +54,7 @@
 SELECT 'i'::citext = 'İ'::citext AS t;
  t 
 ---
- t
+ f
 (1 row)
 
 -- Regression.


Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Joe Conway

On 8/1/24 07:17, Laurenz Albe wrote:

On Wed, 2024-07-31 at 14:43 -0400, Joe Conway wrote:
I still maintain that there is a whole host of users that would accept 
the risk of side channel attacks via existence of an error or not, if 
they could only be sure nothing sensitive leaks directly into the logs 
or to the clients. We should give them that choice.


I think that you are right.


thanks


But what do you tell the users who would not accept that risk?


Document that the option should not be used if that is the case

¯\_(ツ)_/¯

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Joe Conway

On 8/1/24 07:57, Robert Haas wrote:

On Wed, Jul 31, 2024 at 4:42 PM Joe Conway  wrote:

You are assuming that everyone allows direct logins with the ability to
create procedures. Plenty don't.


Well, if you can send queries, then you can do the same thing, driven
by client-side logic.


Sure. Of course you should be monitoring your production servers for 
anomalous workloads, no? "Gee, why is Joe running the same query 
millions of times that keeps throwing errors? Maybe we should go see 
what Joe is up to"



If you can't directly send queries in any form, then I guess things
are different.


Right, and there are plenty of those. I have even worked with at least 
one rather large one on behalf of your employer some years ago.



But I don't really understand what kind of system you have in mind.


Well I did say I was being hand wavy ;-)

It has been a long time since I thought deeply about this. I will try to 
come back with something more concrete if no one beats me to it.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Tom Lane
Joe Conway  writes:
> On 8/1/24 07:17, Laurenz Albe wrote:
>> But what do you tell the users who would not accept that risk?

> Document that the option should not be used if that is the case

Are you proposing that we invent two levels of leakproofness
with different guarantees?  That seems like a mess, not least
because there are going to be varying opinions about where we
should set the bar for the lower level.

regards, tom lane




Re: Query results vary depending on the plan cache mode used

2024-08-01 Thread Tom Lane
Richard Guo  writes:
> While working on the grouping sets patches for queries with GROUP BY
> items that are constants, I noticed $subject on master.  As an
> example, consider

This particular example seems like it's just an illustration of
our known bugs with grouping sets using non-Var columns.  I see
no reason to blame the plan cache for it.  Do you have any
examples that don't depend on such a bug?

(And yes, I apologize for not yet having reviewed your patch
to fix the grouping-sets bug.)

regards, tom lane




Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Robert Haas
On Thu, Aug 1, 2024 at 10:05 AM Joe Conway  wrote:
> Sure. Of course you should be monitoring your production servers for
> anomalous workloads, no? "Gee, why is Joe running the same query
> millions of times that keeps throwing errors? Maybe we should go see
> what Joe is up to"

I think it's possible that something like this could be part of some
useful approach, but I think it's difficult. If it would take Joe a
month of pounding on the server to steal enough data to matter, then I
think monitoring could be one required part of an information
protection strategy. However, if Joe, or someone with Joe's
credentials, can steal all the secret data in under an hour, the
monitoring system probably doesn't help much. A human being probably
won't react quickly enough to stop something bad from happening,
especially if the person with Joe's credentials begins the attack at
2am on Christmas.

More generally, I think it's difficult for us to build infrastructure
into PostgreSQL that relies on complex assumptions about what the
customer environment is. To some extent, we are already relying on
users to prevent certain types of attacks. For example, RLS supposes
that timing attacks or plan-shape based attacks won't be feasible, but
we don't do anything to prevent them; we just hope the user takes care
of it. That's already a shaky assumption, because timing attacks could
well be feasible across a fairly deep application stack e.g. the user
issues an HTTP query for a web page and can detect variations in the
underlying database query latency.

When you start proposing assumptions that the user can't execute DDL
or can't execute SQL queries or that there's monitoring of the error
log in place, I feel the whole thing gets very hard to reason about.
First, you have to decide on exactly what the assumptions are - no
DDL, no direct SQL at all, something else? Different situations could
exist for different users, so whatever assumption we make will not
apply to everyone. Second, for some of this stuff, there's a sliding
scale. If we stipulate that a user is going to need a monitoring
system, how good does that monitoring system have to be? What does it
have to catch, and how quickly are the humans required to respond? If
we stipulate that the attacker can't execute SQL directly, how much
control over the generated SQL are they allowed to have?

I don't want to make it sound like I think it's hopeless to come up
with something clever here. The current situation kind of sucks, and I
do have hopes that there are better ideas out there. At the same time,
we need to be able to articulate clearly what we are and are not
guaranteeing and under what set of assumptions, and it doesn't seem
easy to me to come up with something satisfying.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Official devcontainer config

2024-08-01 Thread Junwang Zhao
Stack Overflow 2024 developer survey[1] said VSCode
is the most used development environment.

In a PostgreSQL Hacker Mentoring discussion, we talked
about how to use vscode to debug and running postgres,
Andrey(ccd) has tons of tips for new developers, and
I post my daily used devcontainer config[2] , Jelte(ccd)
suggested that it might be a good idea we integrate the
config into postgres repo so that the barrier to entry for
new developers will be much lower.

**Note**

This is not intended to change the workflow of experienced
hackers, it is just hoping to make the life easier for
beginner developers.

**How to use**

Open VSCode Command Palette(cmd/ctrl + shift + p),
search devcontainer, then choose something like
`Dev containers: Rebuild and Reopen in Container`, you are
good to go.

**About the patch**

devcontainer.json:

The .devcontainer/devcontainer.json is the entry point for
VSCode to *open folder to develop in a container*, it will build
the docker image for the first time you open in container,
this will take some time.

There are some parameters(runArgs) for running the container,
we need some settings and privileges to run perf or generate
core dumps.

It has a mount point mapping the hosts $HOME/freedom
to container's /opt/freedom, I chose the name *freedom*
because the container is kind of a jail.

It also installed some vscode extensions and did some
customizations.

After diving into the container, the postCreateCommand.sh
will be automatically called, it will do some configurations
like git, perf, .vscode, core_pattern, etc. It also downloads
michaelpq's pg_plugins and FlameGraph.

Dockerfile:

It is based on debian bookworm, it installed dependencies
to build postgres, also IPC::Run to run TAP tests I guess.

It also has a .gdbinit to break elog.c:errfinish for elevel 21,
this will make the debugging easier why error is logged.

gdbpg.py is adapted from https://github.com/tvondra/gdbpg,
I think putting it here will make it evolve as time goes.

tasks.json:

This is kind of like a bookkeeping for developers, it has
the following commands:

- meson debug setup
- meson release setup
- ninja build
- regression tests
- ninja install
- init cluster
- start cluster
- stop cluster
- install pg_bsd_indent
- pgindent
- apply patch
- generate flamegraph

launch.json:

It has one configuration that makes it possible to attach
to one process(e.g. postgres backend) and debug
with vscode.

PFA and please give it a try if you are a VSCode user.

[1]: 
https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment
[2]: https://github.com/atomicdb/devcontainer/tree/main/postgres

--
Regards
Junwang Zhao


v1-0001-official-devcontainer-config.patch
Description: Binary data


Re: [PATCH] Add crc32(text) & crc32(bytea)

2024-08-01 Thread Nathan Bossart
+/*
+ * Calculate CRC32 of the given data.
+ */
+static inline pg_crc32
+crc32_sz(const char *buf, int size)
+{
+   pg_crc32crc;
+   const char *p = buf;
+
+   INIT_TRADITIONAL_CRC32(crc);
+   while (size > 0)
+   {
+   charc = (char) (*p);
+
+   COMP_TRADITIONAL_CRC32(crc, &c, 1);
+   size--;
+   p++;
+   }
+   FIN_TRADITIONAL_CRC32(crc);
+   return crc;
+}

I'm curious why we need to do this instead of only using the macros:

INIT_TRADITIONAL_CRC32(crc);
COMP_TRADITIONAL_CRC32(crc, VARDATA_ANY(in), len);
FIN_TRADITIONAL_CRC32(crc);

+ * IDENTIFICATION
+ *src/backend/utils/adt/hashfuncs.c

Perhaps these would fit into src/backend/utils/hash/pg_crc.c?

-- 
nathan




Re: Official devcontainer config

2024-08-01 Thread Jelte Fennema-Nio
On Thu, 1 Aug 2024 at 16:56, Junwang Zhao  wrote:
> I post my daily used devcontainer config[2] , Jelte(ccd)
> suggested that it might be a good idea we integrate the
> config into postgres repo so that the barrier to entry for
> new developers will be much lower.

In my experience adding a devcontainer config has definitely made it
easier for people to contribute to Citus. So thank you for working on
this! This is not a full review, but an initial pass.

> After diving into the container, the postCreateCommand.sh
> will be automatically called, it will do some configurations
> like git, perf, .vscode, core_pattern, etc. It also downloads
> michaelpq's pg_plugins and FlameGraph.

I think the .git settings don't fit well here, they are mostly aliases
which are very much based on personal preference and not related to
Postgres development. It seems better recommend users to use the
devcontainer dotfiles support for this:
https://code.visualstudio.com/docs/devcontainers/containers#_personalizing-with-dotfile-repositories

> - pgindent

It might make sense to install Tristan (ccd) his Postgres Hacker
extension for vscode to make this a bit more userfriendly:
https://marketplace.visualstudio.com/items?itemName=tristan957.postgres-hacker




Re: PG17beta2: SMGR: inconsistent type for nblocks

2024-08-01 Thread Andres Freund
Hi,

On 2024-08-01 12:45:16 +0200, Matthias van de Meent wrote:
> On Tue, 30 Jul 2024 at 14:32, Thomas Munro  wrote:
> >
> > On Tue, Jul 30, 2024 at 11:24 PM Matthias van de Meent
> >  wrote:
> > > While working on rebasing the patches of Neon's fork onto the
> > > REL_17_STABLE branch, I noticed that the nblocks arguments of various
> > > smgr functions have inconsistent types: smgrzeroextend accepts
> > > `nblocks` as signed integer, as does the new signature for
> > > smgrprefetch, but the new vectorized operations of *readv and *writev,
> > > and the older *writeback all use an unsigned BlockNumber as indicator
> > > for number of blocks.
> > >
> > > Can we update the definition to be consistent across this (new, or
> > > also older) API? As far as I can see, in none of these cases are
> > > negative numbers allowed or expected, so updating this all to be
> > > consistently BlockNumber across the API seems like a straigthforward
> > > patch.
> > >
> > > cc-ed Thomas as committer of the PG17 smgr API changes.
> >
> > Yeah, right, I noticed that once myself[1].  For the cases from my
> > keyboard, I guess I was trying to be consistent with nearby existing
> > stuff in each case, which was already inconsistent...  Do you have a
> > patch?
> 
> Here's one that covers both master and the v17 backbranch.

FWIW, I find it quite ugly to use BlockNumber to indicate the number of blocks
to be written. It's just further increasing the type confusion by conflating
"the first block to be targeted" and "number of blocks".


> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> index 6796756358..1d02766978 100644
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -523,11 +523,11 @@ mdextend(SMgrRelation reln, ForkNumber forknum, 
> BlockNumber blocknum,
>   */
>  void
>  mdzeroextend(SMgrRelation reln, ForkNumber forknum,
> -  BlockNumber blocknum, int nblocks, bool skipFsync)
> +  BlockNumber blocknum, BlockNumber nblocks, bool 
> skipFsync)
>  {
>   MdfdVec*v;
>   BlockNumber curblocknum = blocknum;
> - int remblocks = nblocks;
> + int64   remblocks = nblocks;
>  
>   Assert(nblocks > 0);

Isn't this particularly bogus? What's the point of using a 64bit remblocks
here?

Greetings,

Andres Freund




pg_dump: optimize dumpFunc()

2024-08-01 Thread Nathan Bossart
I've recently committed some optimizations for dumping sequences and
pg_class information (commits 68e9629, bd15b7d, and 2329cad), and I noticed
that we are also executing a query per function in pg_dump.  Commit be85727
optimized this by preparing the query ahead of time, but I found that we
can improve performance further by gathering all the relevant data in a
single query.  Here are the results I see for a database with 10k simple
functions with and without the attached patch:

with patch:

$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.04s user 0.01s system 40% cpu 0.118 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.04s user 0.01s system 41% cpu 0.107 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.04s user 0.01s system 42% cpu 0.103 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.04s user 0.01s system 44% cpu 0.105 
total

without patch:

$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.05s user 0.03s system 32% cpu 0.253 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.05s user 0.03s system 32% cpu 0.252 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.06s user 0.03s system 32% cpu 0.251 
total
$ time pg_dump postgres >/dev/null
pg_dump postgres > /dev/null  0.06s user 0.03s system 33% cpu 0.254 
total

This one looks a little different than the sequence/pg_class commits.  Much
of the function information isn't terribly conducive to parsing into
fixed-size variables in an array, so instead I've opted to just leave the
PGresult around for reference by dumpFunc().  This patch also creates an
ordered array of function OIDs to speed up locating the relevant index in
the PGresult for use in calls to PQgetvalue().

I may be running out of opportunities where this style of optimization
makes much difference.  I'll likely start focusing on the restore side
soon.

-- 
nathan
>From da388f00c57ebc743e9229f0f306b074d35b0be5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 31 Jul 2024 16:47:44 -0500
Subject: [PATCH v1 1/1] optimize dumpFunc()

---
 src/bin/pg_dump/pg_backup.h |   1 -
 src/bin/pg_dump/pg_dump.c   | 232 
 2 files changed, 132 insertions(+), 101 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fbf5f1c515..c1e496ee71 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -68,7 +68,6 @@ enum _dumpPreparedQueries
PREPQUERY_DUMPCOMPOSITETYPE,
PREPQUERY_DUMPDOMAIN,
PREPQUERY_DUMPENUMTYPE,
-   PREPQUERY_DUMPFUNC,
PREPQUERY_DUMPOPR,
PREPQUERY_DUMPRANGETYPE,
PREPQUERY_DUMPTABLEATTACH,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 79190470f7..6297ac0599 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -209,6 +209,10 @@ static int nbinaryUpgradeClassOids = 0;
 static SequenceItem *sequences = NULL;
 static int nsequences = 0;
 
+/* functions */
+static Oid *funcoids = NULL;
+static PGresult *funcs = NULL;
+
 /*
  * The default number of rows per INSERT when
  * --inserts is specified without --rows-per-insert
@@ -289,6 +293,7 @@ static void dumpCompositeTypeColComments(Archive *fout, 
const TypeInfo *tyinfo,

 PGresult *res);
 static void dumpShellType(Archive *fout, const ShellTypeInfo *stinfo);
 static void dumpProcLang(Archive *fout, const ProcLangInfo *plang);
+static void collectFuncs(Archive *fout);
 static void dumpFunc(Archive *fout, const FuncInfo *finfo);
 static void dumpCast(Archive *fout, const CastInfo *cast);
 static void dumpTransform(Archive *fout, const TransformInfo *transform);
@@ -1032,6 +1037,10 @@ main(int argc, char **argv)
/* Collect sequence information. */
collectSequences(fout);
 
+   /* Collect function information. */
+   if (!dopt.dataOnly)
+   collectFuncs(fout);
+
/* Lastly, create dummy objects to represent the section boundaries */
boundaryObjs = createBoundaryObjects();
 
@@ -12065,6 +12074,101 @@ format_function_signature(Archive *fout, const 
FuncInfo *finfo, bool honor_quote
return fn.data;
 }
 
+/*
+ * bsearch() comparator for Oid
+ */
+static int
+OidCmp(const void *p1, const void *p2)
+{
+   Oid o1 = *((const Oid *) p1);
+   Oid o2 = *((const Oid *) p2);
+
+   return pg_cmp_u32(o1, o2);
+}
+
+/*
+ * collectFuncs
+ *
+ * Obtain function metadata and construct an ordered array of function OIDs for
+ * use by dumpFunc() to quickly find the index of a function entry.
+ */
+static void
+collectFuncs(Archive *fout)
+{
+   PQExpBuffer query = createPQExpBuffer()

Re: Flush pgstats file during checkpoints

2024-08-01 Thread Michael Paquier
On Tue, Jul 30, 2024 at 08:53:48AM +, Bertrand Drouvot wrote:
> Did a quick check and still LGTM.

Applied 0003 for now to add the redo LSN to the pgstats file, adding
the redo LSN to the two DEBUG2 entries when reading and writing while
on it, that I forgot.  (It was not 01:57 where I am now.)

Attached is the last one.
--
Michael
From 8f1f7a9ca9c19dae88057a5593b0f9c0cd748a88 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 17 Jul 2024 12:42:43 +0900
Subject: [PATCH v6] Flush pgstats file during checkpoints

The flushes are done for non-shutdown checkpoints and restart points, so
as it is possible to keep some statistics around in the event of a
crash.  Keeping the before_shmem_exit() callback to flush the pgstats
file in a shutdown sequence is a bit easier than doing so at checkpoint,
as they may be stats to flush before we are completely done with the
shutdown checkpoint.  Keeping the callback also keeps the shutdown of
single-user mode simpler, where the stats also need to be flushed.

At the beginning of recovery, the stats file is loaded when the redo LSN
it stores matches with the point we are replaying from.  The file is not
removed anymore after being read, to ensure that there is still a source
of stats to feed on should the system crash until the next checkpoint
that would update the stats.
---
 src/include/pgstat.h  |  4 +-
 src/backend/access/transam/xlog.c | 30 +
 src/backend/utils/activity/pgstat.c   | 65 ---
 src/test/recovery/t/029_stats_restart.pl  | 26 ++--
 .../recovery/t/030_stats_cleanup_replica.pl   |  2 +-
 5 files changed, 57 insertions(+), 70 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 043d39a565..f780b56cc3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -466,9 +466,9 @@ typedef struct PgStat_PendingWalStats
 extern Size StatsShmemSize(void);
 extern void StatsShmemInit(void);
 
-/* Functions called during server startup / shutdown */
+/* Functions called during server startup / checkpoint / shutdown */
 extern void pgstat_restore_stats(XLogRecPtr redo);
-extern void pgstat_discard_stats(void);
+extern void pgstat_flush_stats(XLogRecPtr redo);
 extern void pgstat_before_server_shutdown(int code, Datum arg);
 
 /* Functions for backend initialization */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6499eabe4d..f058d68fc3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5401,7 +5401,6 @@ StartupXLOG(void)
 	XLogCtlInsert *Insert;
 	CheckPoint	checkPoint;
 	bool		wasShutdown;
-	bool		didCrash;
 	bool		haveTblspcMap;
 	bool		haveBackupLabel;
 	XLogRecPtr	EndOfLog;
@@ -5521,10 +5520,7 @@ StartupXLOG(void)
 	{
 		RemoveTempXlogFiles();
 		SyncDataDirectory();
-		didCrash = true;
 	}
-	else
-		didCrash = false;
 
 	/*
 	 * Prepare for WAL recovery if needed.
@@ -5645,14 +5641,8 @@ StartupXLOG(void)
 	 *
 	 * NB: Restoring replication slot stats relies on slot state to have
 	 * already been restored from disk.
-	 *
-	 * TODO: With a bit of extra work we could just start with a pgstat file
-	 * associated with the checkpoint redo location we're starting from.
 	 */
-	if (didCrash)
-		pgstat_discard_stats();
-	else
-		pgstat_restore_stats(checkPoint.redo);
+	pgstat_restore_stats(checkPoint.redo);
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7244,6 +7234,15 @@ CreateCheckPoint(int flags)
 	 */
 	END_CRIT_SECTION();
 
+	/*
+	 * The control file update is done, hence it is time to write some fresh
+	 * statistics.  Stats are flushed at shutdown by the checkpointer in a
+	 * dedicated before_shmem_exit callback, combined with sanity checks
+	 * related to the shutdown of pgstats.
+	 */
+	if (!shutdown)
+		pgstat_flush_stats(RedoRecPtr);
+
 	/*
 	 * WAL summaries end when the next XLOG_CHECKPOINT_REDO or
 	 * XLOG_CHECKPOINT_SHUTDOWN record is reached. This is the first point
@@ -7713,6 +7712,15 @@ CreateRestartPoint(int flags)
 	}
 	LWLockRelease(ControlFileLock);
 
+	/*
+	 * The control file update is done, hence it is time to write some fresh
+	 * statistics.  Stats are flushed at shutdown by the checkpointer in a
+	 * dedicated before_shmem_exit callback, combined with sanity checks
+	 * related to the shutdown of pgstats.
+	 */
+	if ((flags & CHECKPOINT_IS_SHUTDOWN) == 0)
+		pgstat_flush_stats(RedoRecPtr);
+
 	/*
 	 * Update the average distance between checkpoints/restartpoints if the
 	 * prior checkpoint exists.
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 81484222cf..0b02353359 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -10,9 +10,10 @@
  * statistics.
  *
  * Statistics are loaded from the filesystem during startup (by the startup
- * process), unless preceded by a crash, in which case all stats are
- * discarded. They are written out by the checkpointer 

Re: [Proposal] Add foreign-server health checks infrastructure

2024-08-01 Thread Fujii Masao




On 2024/07/29 12:58, Hayato Kuroda (Fujitsu) wrote:

Dear Fujii-san,


IIUC, the patch which adds user_name attribute to get_connection() can be

discussed

in later stage, is it right?


No, let's work on the patch at this stage :)


OK, here is a rebased patch.


Thanks for updating the patch!


- Changed the name of new API from `GetUserMappingFromOid` to 
`GetUserMappingByOid`
   to keep the name consistent with others.


If we expose this function as an FDW helper function, it should return
a complete UserMapping object, including umoptions.

However, if postgres_fdw_get_connections() is the only user of this function,
I'm not inclined to expose it as an FDW helper. Instead, we can either get
the user ID by user mapping OID directly in connection.c using SearchSysCache(),
or add the user ID to ConnCacheEntry and use it in 
postgres_fdw_get_connections().
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: PG17beta2: SMGR: inconsistent type for nblocks

2024-08-01 Thread Matthias van de Meent
Hi,

On Thu, 1 Aug 2024 at 18:44, Andres Freund  wrote:
> On 2024-08-01 12:45:16 +0200, Matthias van de Meent wrote:
> > Here's one that covers both master and the v17 backbranch.
>
> FWIW, I find it quite ugly to use BlockNumber to indicate the number of blocks
> to be written. It's just further increasing the type confusion by conflating
> "the first block to be targeted" and "number of blocks".

IIf BlockNumber doesn't do it for you, then between plain uint32 and
int64, which would you prefer? int itself doesn't allow syncing of all
blocks of a relation's fork, so that's out for me.

> > diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> > index 6796756358..1d02766978 100644
> > --- a/src/backend/storage/smgr/md.c
> > +++ b/src/backend/storage/smgr/md.c
> > @@ -523,11 +523,11 @@ mdextend(SMgrRelation reln, ForkNumber forknum, 
> > BlockNumber blocknum,
> >   */
> >  void
> >  mdzeroextend(SMgrRelation reln, ForkNumber forknum,
> > -  BlockNumber blocknum, int nblocks, bool skipFsync)
> > +  BlockNumber blocknum, BlockNumber nblocks, bool 
> > skipFsync)
> >  {
> >   MdfdVec*v;
> >   BlockNumber curblocknum = blocknum;
> > - int remblocks = nblocks;
> > + int64   remblocks = nblocks;
> >
> >   Assert(nblocks > 0);
>
> Isn't this particularly bogus? What's the point of using a 64bit remblocks
> here?

To prevent underflows in the loop below, if any would happen to exist.
Could've been BlockNumber too, but I went with a slightly more
defensive approach.

Kind regards,

Matthias van de Meent




Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-01 Thread Nathan Bossart
On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
> I like your idea of parallelizing these checks with async libpq API, thanks
> for working on it. The patch doesn't apply cleanly on master anymore, but
> I've rebased locally and taken it for a quick spin with a pg16 instance of
> 1000 empty databases. Didn't see any regressions with -j 1, there's some
> speedup with -j 8 (33 sec vs 8 sec for these checks).

Thanks for taking a look.  I'm hoping to do a round of polishing before
posting a rebased patch set soon.

> One thing that I noticed that could be improved is we could start a new
> connection right away after having run all query callbacks for the current
> connection in process_slot, instead of just returning and establishing the
> new connection only on the next iteration of the loop in async_task_run
> after potentially sleeping on select.

Yeah, we could just recursively call process_slot() right after freeing the
slot.  That'd at least allow us to avoid the spinning behavior as we run
out of databases to process, if nothing else.

> +1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps
> that's a separate story.

When I skimmed through the various tasks, I didn't see a ton of
opportunities for further consolidation, or at least opportunities that
would help for upgrades from supported versions.  The data type checks are
already consolidated, for example.

-- 
nathan




Re: On disable_cost

2024-08-01 Thread Robert Haas
On Wed, Jul 31, 2024 at 10:01 PM David Rowley  wrote:
> I've reviewed both patches, here's what I noted down during my review:

Thanks.

> 0. I've not seen any mention so far about postgres_fdw's
> use_remote_estimate.  Maybe changing the costs is fixing an issue that
> existed before. I'm just not 100% sure on that.
>
> patched:
>  Foreign Scan on ft  (cost=100.00..671.00 rows=2550 width=4)
>
> master:
>  Foreign Scan on ft  (cost=1000100.00..1000671.00 rows=2550 width=4)
>
> I kinda think that might be fixing an issue that I don't recall being
> reported before. I think we shouldn't really care that much about what
> nodes are disabled on the remote server and not having disabled_cost
> applied to that gives us that.

Hmm, I think it's subjective which behavior is better. If somebody
thought the new behavior was worse, they might want the remote side's
count of disabled nodes to be propagated to the local side, but I'm
disinclined to go there. My guess is that it doesn't matter much
either way what we do here, so I'd rather not add more code.

> 1. The final sentence of the function header comment needs to be
> updated in estimate_path_cost_size().

Fixed.

> 2. Does cost_tidscan() need to update the header comment to say
> tidquals must not be empty?

IMHO, no. The assertions I added to that function were intended as
documentation of what that function was already assuming about the
behavior of its caller. I had to trace through the logic in tidpath.c
for quite a while to understand why cost_tidscan() was not completely
broken. To spare the next person the trouble of working that out, I
added assertions. Now we could additionally add commentary in English
that restates what the assertions already say, but I feel like having
the assertions is good enough. If somebody ever whacks around
tidpath.c such that these assertions start failing, I think it will be
fairly clear to them that they either need to revert their changes in
tidpath.c or upgrade the logic in this function to cope.

> 3. final_cost_nestloop() seems to initially use the disabled_nodes
> from initial_cost_nestloop() but then it goes off and calculates it
> again itself. One of these seems redundant.

Oops. Fixed.

> The "We could include
> disable_cost in the preliminary estimate" comment explains why it was
> originally left to final_cost_nestloop(), so maybe worth sticking to
> that? I don't quite know the full implications, but it does not seem
> worth risking a behaviour change here.

I don't really see how there could be a behavior change here, unless
there's a bug. Dealing with the enable_* flags in initial_cost_XXX
rather than final_cost_XXX could be better or worse from a performance
standpoint and it could make for cleaner or less clean code, but the
user-facing behavior should be identical unless there are bugs.

The reason why I changed this is because of the logic in
add_path_precheck(): it exits early as soon as it sees a path whose
total cost is greater than the cost of the proposed new path. Since
the patch's aim is to treat disabled_node as a high-order component of
the cost, we need to make the same decision by comparing the count of
disabled_nodes first and then if that is equal, we need to compare the
total_cost. We can't do that if we don't have the count of
disabled_nodes for the proposed new path.

I think this may be a bit hard to understand, so let me give a
concrete example. Suppose we're planning some join where one side can
only be planned with a sequential scan and sequential scans are
disabled. We have ten paths in the path list and they have costs of
1e10+100, 1e10+200, ..., 1e10+1000. Now add_path_precheck() is asked
to consider a new path where there is a disabled node on BOTH sides of
the join -- the one side has the disabled sequential scan, but now the
other side also has something disabled, so the cost is let's say
2e10+79. add_path_precheck() can see at once that this path is a
loser: it can't possibly dominate any path that already exists,
because it costs more than any of them. But when you take disable_cost
out, things look quite different. Now you have a proposed path with a
total_cost of 79 and a path list with costs of 100, ..., 1000. If
you're not allowed to know anything about disabled_nodes, the new path
looks like it might be valuable. You might decide to construct it and
try inserting into the pathlist, which will end up being useless, and
even if you don't, you're going to compare its pathkeys and
parameterization to each of the 10 existing paths before giving up.
Bummer.

So, to avoid getting much stupider than it is currently,
add_path_precheck() needs a preliminary estimate of the number of
disabled nodes just like it needs a preliminary estimate of the total
cost. And to avoid regressions, that estimate needs to be pretty good.
A naive estimate would be to just add up the number of disabled_nodes
on the inner and outer paths, but that would be a regression in the
merge-join case, b

Re: rare crash - FailedAssertion snapbuild.c Line: 580

2024-08-01 Thread Euler Taveira
On Thu, Aug 1, 2024, at 6:59 AM, Pradeep Kumar wrote:
> Any update on this?

Are you using version 11? This version is obsolete and is not supported
anymore. Consider a supported version [1].

Per the following commit (in version 16), this Assert was replaced by
elog.

commit: 240e0dbacd390a8465552e27c5af11f67d747adb
author: Amit Kapila 
date: Mon, 21 Nov 2022 08:54:43 +0530
committer: Amit Kapila 
date: Mon, 21 Nov 2022 08:54:43 +0530
Add additional checks while creating the initial decoding snapshot.

As per one of the CI reports, there is an assertion failure which
indicates that we were trying to use an unenforced xmin horizon for
decoding snapshots. Though, we couldn't figure out the reason for
assertion failure these checks would help us in finding the reason if the
problem happens again in the future.

Author: Amit Kapila based on suggestions by Andres Freund
Reviewd by: Andres Freund
Discussion: 
https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=zuwyaym...@mail.gmail.com

According to this discussion, there isn't a clue about the root cause.
If you have a test case, share it (mainly if you are observing it in
version 16+ that exposes some data which may be useful for analysis).


[1] https://www.postgresql.org/support/versioning/


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Jeff Davis
EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().

That bypasses the SECURITY_RESTRICTED_OPERATION in ExecCreateTableAs().
That is *not* a security problem, because the
SECURITY_RESTRICTED_OPERATION in CREATE MATERIALIZED VIEW is merely for
consistency with a later REFRESH MATERIALIZED VIEW command where the
SECURITY_RESTRICTED_OPERATION is important.

But it is inconsistent. For example:

  create or replace function set() returns int
language plpgsql as $$
  begin
create temp table x(i int);
return 42;
  end;
$$;
  create materialized view mv1 as select set(); -- fails
  explain analyze
create materialized view mv1 as select set(); -- succeeds

Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?

Comments?

Regards,
Jeff Davis





Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-08-01 Thread Jeff Davis
On Wed, 2024-07-31 at 18:20 +0900, Yugo NAGATA wrote:
> I agree that it might not be important, but I think adding the flag
> would be
> also helpful for improving code-readability because it clarify the
> function
> is used in the two cases. I attached patch for this fix (patch 0003).

Committed with one minor modification: I moved the boolean flag to be
near the other booleans rather than at the end. Thank you.

> Sure. I fixed the patch to remove 'param' from both functions. (patch
> 0002)

Committed, thank you.

> I also add the small refactoring around ExecCreateTableAs(). (patch
> 0001)
> 
> - Remove matview-related codes from intorel_startup.
>   Materialized views are no longer handled in this function.
> 
> - RefreshMatViewByOid is moved to just after create_ctas_nodata
>   call to improve code readability.
> 

I'm not sure the changes in intorel_startup() are correct. I tried
adding an Assert(into->viewQuery == NULL), and it fails because there's
another path I did not consider: "EXPLAIN ANALYZE CREATE MATERIALIZED
VIEW ...", which does not go through ExecCreateTableAs() but does go
through CreateIntoRelDestReceiver().

See:

https://postgr.es/m/20444c382e6cb5e21e93c94d679d0198b0dba4dd.ca...@j-davis.com

Should we refactor a bit and try to make EXPLAIN use the same code
paths?

Regards,
Jeff Davis





Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Kirill Reshke
On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
> should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?
Sure

> Comments?
Seems like this is indeed inconsistent behaviour and should be fixed
in all PGDG-supported versions in the upcoming August release.
Do you have any suggestions on how to fix this?


> Regards,
> Jeff Davis




Re: Asymmetric partition-wise JOIN

2024-08-01 Thread Alexander Korotkov
Hi!

On Tue, Apr 2, 2024 at 6:07 AM Andrei Lepikhov
 wrote:
> On 15/10/2023 13:25, Alexander Korotkov wrote:
> > Great!  I'm looking forward to the revised patch.
> Revising the code and opinions before restarting this work, I found two
> different possible strategies mentioned in the thread:
> 1. 'Common Resources' shares the materialised result of the inner table
> scan (a hash table in the case of HashJoin) to join each partition one
> by one. It gives us a profit in the case of parallel append and possibly
> other cases, like the one shown in the initial message.
> 2. 'Individual strategies' - By limiting the AJ feature to cases when
> the JOIN clause contains a partitioning expression, we can push an
> additional scan clause into each copy of the inner table scan, reduce
> the number of tuples scanned, and even prune something because of proven
> zero input.
>
> I see the pros and cons of both approaches. The first option is more
> straightforward, and its outcome is obvious in the case of parallel
> append. But how can we guarantee the same join type for each join? Why
> should we ignore the positive effect of different strategies for
> different partitions?
> The second strategy is more expensive for the optimiser, especially in
> the multipartition case. But as I can predict, it is easier to implement
> and looks more natural for the architecture. What do you think about that?

Actually, the idea I tried to express is the combination of #1 and #2:
to build individual plan for every partition, but consider the 'Common
Resources'.  Let me explain this a bit more.

Right now, we basically we consider the following properties during
selection of paths.
1) Cost.  The cheaper path wins.  There a two criteria though: startup
cost and total cost.  So, we can keep both paths with cheaper startup
costs and paths with cheaper total cost.
2) Pathkeys.  We can keep a path with more expensive path, which has
pathkeys potentially useful in future.

My idea is to introduce a new property for paths selection.
3) Usage of common resources.  The common resource can be: hash
representation of relation, memoize over relation scan, etc.  We can
exclude the cost of common resource generation from the path cost, but
keep the reference for the common resource with its generation cost.
If one path uses more common resources than another path, it could
cost-dominate another one only if its cheaper together with its extra
common resources cost.  If one path uses less or equal common
resources than another, it could normally cost-dominate another one.

Using these rules, we can gather the the plurality of paths for each
child join taking common resources into account.  After that we can
apply some global optimization finding generation of which common
resources can reduce the global cost.

However, I understand this is huge amount of work given we have to
introduce new basic optimizer concepts.  I get that the main
application of this patch is sharding.  If we have global tables
residing each shard, we can push down any joins with them.  Given this
patch gives some optimization for non-sharded case, I think we
*probably* can accept its concept even that it this optimization is
obviously not perfect.

--
Regards,
Alexander Korotkov
Supabase




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Kirill Reshke
On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
>
> EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().

EXPLAIN ANALYZE and regular query goes through create_ctas_internal
(WITH NO DATA case too). Maybe we can simply move
SetUserIdAndSecContext call in this function?




Re: Support LIKE with nondeterministic collations

2024-08-01 Thread Daniel Verite
Jeff Davis wrote:

> >   col LIKE 'smith%' collate "nd"
> > 
> > is equivalent to:
> > 
> >   col >= 'smith' collate "nd" AND col < U&'smith\' collate "nd"
> 
> That logic seems to assume something about the collation. If you have a
> collation that orders strings by their sha256 hash, that would entirely
> break the connection between prefixes and ranges, and it wouldn't work.

Indeed I would not trust this trick to work outside of ICU collations.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Jeff Davis
On Fri, 2024-08-02 at 00:13 +0500, Kirill Reshke wrote:
> On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > 
> > EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> > ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().
> 
> EXPLAIN ANALYZE and regular query goes through create_ctas_internal
> (WITH NO DATA case too). Maybe we can simply move
> SetUserIdAndSecContext call in this function?

We need to set up the SECURITY_RESTRICTED_OPERATION before planning, in
case the planner executes some functions.

I believe we need to do some more refactoring to make this work. In
version 17, I already refactored so that CREATE MATERIALIZED VIEW and
REFRESH MATERIALIZED VIEW share the code. We can do something similar
to extend that to EXPLAIN ... CREATE MATERIALIZED VIEW.

As for the August release, the code freeze is on Saturday. Perhaps it
can be done by then, but is there a reason we should rush it? This
affects all supported versions, so we've lived with it for a while, and
I don't see a security problem here. I wouldn't expect it to be a
common use case, either.

Regards,
Jeff Davis





Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Jacob Champion
On Wed, Jul 31, 2024 at 1:26 PM Robert Haas  wrote:
> However, the risk is that an end-user is going to be much less able to
> evaluate what is and isn't safe than we are. I think some people are
> going to be like -- well the core project doesn't mark enough stuff
> leakproof, so I'll just go add markings to a bunch of stuff myself.
> And they probably won't stop at stuff like UPPER which is almost
> leakproof. They might add it to stuff such as LIKE which results in
> immediately giving away the farm. By not giving people any guidance,
> we invite them to make up their own rules.

+1.

Would it provide enough value for effort to explicitly mark leaky
procedures as such? Maybe that could shrink the grey area enough to be
protective?

--Jacob




Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Jacob Champion
On Thu, Aug 1, 2024 at 7:26 AM Tom Lane  wrote:
> Are you proposing that we invent two levels of leakproofness
> with different guarantees?  That seems like a mess, not least
> because there are going to be varying opinions about where we
> should set the bar for the lower level.

It kind of reminds me of the kernel's "paranoia level" tunables, which
seem to proliferate in weird ways [1] and not make for a particularly
great UX.

--Jacob

[1] 
https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do




Re: Pluggable cumulative statistics

2024-08-01 Thread Michael Paquier
On Sun, Jul 28, 2024 at 10:03:56PM +0200, Dmitry Dolgov wrote:
> So far I've got nothing against :)

I've looked again at the first patch of this series, and applied the
first one.  Another last-minute edit I have done is to use more
consistently PgStat_Kind in the loops for the stats kinds across all
the pgstats code.

Attached is a rebased set of the rest, with 0001 now introducing the
pluggable core part.
--
Michael
From af5a2456f90bf02b993c7c718e45a4cd6d863c79 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 2 Aug 2024 05:42:52 +0900
Subject: [PATCH v3 1/5] Introduce pluggable APIs for Cumulative Statistics

This commit adds support in the backend for $subject, allowing
out-of-core extensions to add their own custom statistics kinds.  The
stats kinds are divided into two parts for efficiency:
- The built-in stats kinds, with designated initializers.
- The custom kinds, able to use a range of IDs (128 slots available as
of this patch), with information saved in TopMemoryContext.

Custom cumulative statistics can only be loaded with
shared_preload_libraries at startup, and must allocate a unique ID
shared across all the PostgreSQL extension ecosystem with the following
wiki page:
https://wiki.postgresql.org/wiki/CustomCumulativeStats

This is able to support fixed-numbered (like WAL, archiver, bgwriter)
and variable-numbered stats kinds.
---
 src/include/pgstat.h  |  35 +++-
 src/include/utils/pgstat_internal.h   |  22 ++-
 src/backend/utils/activity/pgstat.c   | 231 +++---
 src/backend/utils/activity/pgstat_shmem.c |  31 ++-
 src/backend/utils/adt/pgstatfuncs.c   |   2 +-
 5 files changed, 287 insertions(+), 34 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f84e9fdeca..cd93a0109b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -35,6 +35,10 @@
 /* The types of statistics entries */
 #define PgStat_Kind uint32
 
+/* Range of IDs allowed, for built-in and custom kinds */
+#define PGSTAT_KIND_MIN	1		/* Minimum ID allowed */
+#define PGSTAT_KIND_MAX	256		/* Maximum ID allowed */
+
 /* use 0 for INVALID, to catch zero-initialized data */
 #define PGSTAT_KIND_INVALID 0
 
@@ -53,9 +57,34 @@
 #define PGSTAT_KIND_SLRU	10
 #define PGSTAT_KIND_WAL	11
 
-#define PGSTAT_KIND_FIRST_VALID PGSTAT_KIND_DATABASE
-#define PGSTAT_KIND_LAST PGSTAT_KIND_WAL
-#define PGSTAT_NUM_KINDS (PGSTAT_KIND_LAST + 1)
+#define PGSTAT_KIND_MIN_BUILTIN PGSTAT_KIND_DATABASE
+#define PGSTAT_KIND_MAX_BUILTIN PGSTAT_KIND_WAL
+
+/* Custom stats kinds */
+
+/* Range of IDs allowed for custom stats kinds */
+#define PGSTAT_KIND_CUSTOM_MIN	128
+#define PGSTAT_KIND_CUSTOM_MAX	PGSTAT_KIND_MAX
+#define PGSTAT_KIND_CUSTOM_SIZE	(PGSTAT_KIND_CUSTOM_MAX - PGSTAT_KIND_CUSTOM_MIN + 1)
+
+/*
+ * PgStat_Kind to use for extensions that require an ID, but are still in
+ * development and have not reserved their own unique kind ID yet. See:
+ * https://wiki.postgresql.org/wiki/CustomCumulativeStats
+ */
+#define PGSTAT_KIND_EXPERIMENTAL	128
+
+static inline bool
+pgstat_is_kind_builtin(PgStat_Kind kind)
+{
+	return kind > PGSTAT_KIND_INVALID && kind <= PGSTAT_KIND_MAX_BUILTIN;
+}
+
+static inline bool
+pgstat_is_kind_custom(PgStat_Kind kind)
+{
+	return kind >= PGSTAT_KIND_CUSTOM_MIN && kind <= PGSTAT_KIND_CUSTOM_MAX;
+}
 
 /* Values for track_functions GUC variable --- order is significant! */
 typedef enum TrackFunctionsLevel
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 778f625ca1..39f63362a3 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -195,7 +195,8 @@ typedef struct PgStat_KindInfo
 
 	/*
 	 * The size of an entry in the shared stats hash table (pointed to by
-	 * PgStatShared_HashEntry->body).
+	 * PgStatShared_HashEntry->body).  For fixed-numbered statistics, this is
+	 * the size of an entry in PgStat_ShmemControl->custom_data.
 	 */
 	uint32		shared_size;
 
@@ -446,6 +447,13 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
+
+	/*
+	 * Custom stats data with fixed-numbered objects, indexed by (PgStat_Kind
+	 * - PGSTAT_KIND_CUSTOM_MIN).
+	 */
+	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
+
 } PgStat_ShmemControl;
 
 
@@ -459,7 +467,7 @@ typedef struct PgStat_Snapshot
 	/* time at which snapshot was taken */
 	TimestampTz snapshot_timestamp;
 
-	bool		fixed_valid[PGSTAT_NUM_KINDS];
+	bool		fixed_valid[PGSTAT_KIND_MAX_BUILTIN + 1];
 
 	PgStat_ArchiverStats archiver;
 
@@ -473,6 +481,14 @@ typedef struct PgStat_Snapshot
 
 	PgStat_WalStats wal;
 
+	/*
+	 * Data in snapshot for custom fixed-numbered statistics, indexed by
+	 * (PgStat_Kind - PGSTAT_KIND_CUSTOM_MIN).  Each entry is allocated in
+	 * TopMemoryContext, for a size of shared_data_len.
+	 */
+	bool		custom_valid[PGSTAT_KIND_CUSTOM_SIZE];
+	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
+
 	/* to free snapshot in bulk */
 	MemoryContext co

Re: Improving tracking/processing of buildfarm test failures

2024-08-01 Thread Andrew Dunstan



On 2024-08-01 Th 5:00 AM, Alexander Lakhin wrote:



I also wrote a simple script (see attached) to check for unknown 
buildfarm

failures using "HTML API", to make sure no failures missed. Surely, it
could be improved in many ways, but I find it rather useful as-is.




I think we can improve on that. Scraping HTML is not a terribly 
efficient way of doing it. I'd very much like to improve the reporting 
side of the server.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Official devcontainer config

2024-08-01 Thread Andrew Dunstan



On 2024-08-01 Th 10:56 AM, Junwang Zhao wrote:

Stack Overflow 2024 developer survey[1] said VSCode
is the most used development environment.

In a PostgreSQL Hacker Mentoring discussion, we talked
about how to use vscode to debug and running postgres,
Andrey(ccd) has tons of tips for new developers, and
I post my daily used devcontainer config[2] , Jelte(ccd)
suggested that it might be a good idea we integrate the
config into postgres repo so that the barrier to entry for
new developers will be much lower.

**Note**

This is not intended to change the workflow of experienced
hackers, it is just hoping to make the life easier for
beginner developers.

**How to use**

Open VSCode Command Palette(cmd/ctrl + shift + p),
search devcontainer, then choose something like
`Dev containers: Rebuild and Reopen in Container`, you are
good to go.

**About the patch**

devcontainer.json:

The .devcontainer/devcontainer.json is the entry point for
VSCode to *open folder to develop in a container*, it will build
the docker image for the first time you open in container,
this will take some time.

There are some parameters(runArgs) for running the container,
we need some settings and privileges to run perf or generate
core dumps.

It has a mount point mapping the hosts $HOME/freedom
to container's /opt/freedom, I chose the name *freedom*
because the container is kind of a jail.

It also installed some vscode extensions and did some
customizations.

After diving into the container, the postCreateCommand.sh
will be automatically called, it will do some configurations
like git, perf, .vscode, core_pattern, etc. It also downloads
michaelpq's pg_plugins and FlameGraph.

Dockerfile:

It is based on debian bookworm, it installed dependencies
to build postgres, also IPC::Run to run TAP tests I guess.

It also has a .gdbinit to break elog.c:errfinish for elevel 21,
this will make the debugging easier why error is logged.

gdbpg.py is adapted from https://github.com/tvondra/gdbpg,
I think putting it here will make it evolve as time goes.

tasks.json:

This is kind of like a bookkeeping for developers, it has
the following commands:

- meson debug setup
- meson release setup
- ninja build
- regression tests
- ninja install
- init cluster
- start cluster
- stop cluster
- install pg_bsd_indent
- pgindent
- apply patch
- generate flamegraph

launch.json:

It has one configuration that makes it possible to attach
to one process(e.g. postgres backend) and debug
with vscode.

PFA and please give it a try if you are a VSCode user.

[1]: 
https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment
[2]: https://github.com/atomicdb/devcontainer/tree/main/postgres



Not totally opposed, and I will probably give it a try very soon, but 
I'm wondering if this really needs to go in the core repo. We've 
generally shied away from doing much in the way of editor / devenv 
support, trying to be fairly agnostic. It's true we carry .dir-locals.el 
and .editorconfig, so that's not entirely true, but those are really 
just about supporting our indentation etc. standards.


Also, might it not be better for this to be carried in a separate repo 
maintained by people using vscode? I don't know how may committers do. 
Maybe lots do, and I'm just a dinosaur. If vscode really needs 
.devcontainer to live in the code root, maybe it could be symlinked. 
Another reason not carry the code ourselves is that it will make it less 
convenient for people who want to customize it.


Without having tried it, looks like a nice effort though. Well done.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-01 Thread Nathan Bossart
On Thu, Aug 01, 2024 at 12:44:35PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
>> I like your idea of parallelizing these checks with async libpq API, thanks
>> for working on it. The patch doesn't apply cleanly on master anymore, but
>> I've rebased locally and taken it for a quick spin with a pg16 instance of
>> 1000 empty databases. Didn't see any regressions with -j 1, there's some
>> speedup with -j 8 (33 sec vs 8 sec for these checks).
> 
> Thanks for taking a look.  I'm hoping to do a round of polishing before
> posting a rebased patch set soon.
> 
>> One thing that I noticed that could be improved is we could start a new
>> connection right away after having run all query callbacks for the current
>> connection in process_slot, instead of just returning and establishing the
>> new connection only on the next iteration of the loop in async_task_run
>> after potentially sleeping on select.
> 
> Yeah, we could just recursively call process_slot() right after freeing the
> slot.  That'd at least allow us to avoid the spinning behavior as we run
> out of databases to process, if nothing else.

Here is a new patch set.  Besides rebasing, I've added the recursive call
to process_slot() mentioned in the quoted text, and I've added quite a bit
of commentary to async.c.

-- 
nathan
>From 0331c1bbb8bee8fa079c761c4a0174212312e709 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 28 Jun 2024 11:02:44 -0500
Subject: [PATCH v7 01/11] introduce framework for parallelizing pg_upgrade
 tasks

---
 src/bin/pg_upgrade/Makefile  |   1 +
 src/bin/pg_upgrade/async.c   | 505 +++
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/pg_upgrade.h  |  16 +
 src/tools/pgindent/typedefs.list |   4 +
 5 files changed, 527 insertions(+)
 create mode 100644 src/bin/pg_upgrade/async.c

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..3bc4f5d740 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -12,6 +12,7 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = \
$(WIN32RES) \
+   async.o \
check.o \
controldata.o \
dump.o \
diff --git a/src/bin/pg_upgrade/async.c b/src/bin/pg_upgrade/async.c
new file mode 100644
index 00..b5b36c4725
--- /dev/null
+++ b/src/bin/pg_upgrade/async.c
@@ -0,0 +1,505 @@
+/*
+ * async.c
+ * parallelization via libpq's async APIs
+ *
+ * This framework provides an efficient way of running the various
+ * once-in-each-database tasks required by pg_upgrade.  Specifically, it
+ * parallelizes these tasks by managing a simple state machine of
+ * user_opts.jobs slots and using libpq's asynchronous APIs to establish the
+ * connections and run the queries.  Callers simply need to create a couple of
+ * callback functions and build/execute an AsyncTask.  A simple example
+ * follows:
+ *
+ * static char *
+ * my_query_cb(void *arg)
+ * {
+ * // NB: RESULT MUST BE ALLOC'D!
+ * return pg_strdup("... query text ...");
+ * }
+ *
+ * static void
+ * my_process_cb(DbInfo *dbinfo, PGresult *res, void *arg)
+ * {
+ * for (int i = 0; i < PQntuples(res); i++)
+ * {
+ * ... process results ...
+ * }
+ * }
+ *
+ * void
+ * my_task(ClusterInfo *cluster)
+ * {
+ * AsyncTask  *task = async_task_create();
+ *
+ * async_task_add_step(task,
+ * my_query_cb,
+ * my_process_cb,
+ * true,   // let 
the task free the PGresult
+ * NULL);  // 
"arg" pointer for the callbacks
+ * async_task_run(task, cluster);
+ * async_task_free(task);
+ * }
+ *
+ * Note that multiple steps can be added to a given task.  When there are
+ * multiple steps, the task will run all of the steps consecutively in the same
+ * database connection before freeing the connection and moving on.  In other
+ * words, it only ever initiates one connection to each database in the
+ * cluster for a given run.
+ *
+ * Also note that the query callbacks (i.e., the AsyncTaskGetQueryCB given to
+ * the task) are only run once, and their result is stored and reused in all
+ * databases.  In practice, this means that the query callbacks cannot depend
+ * on anything database-specific.  If you find yourself trying to use
+ * database-specific queries, chances are this is the wrong tool for the job.
+ *
+ * As shown in the example above, the query callbacks must return an 

Re: Asymmetric partition-wise JOIN

2024-08-01 Thread Alexander Korotkov
On Sun, May 5, 2024 at 5:55 PM Andrei Lepikhov  wrote:
> On 18/10/2023 16:59, Ashutosh Bapat wrote:
> > On Wed, Oct 18, 2023 at 10:55 AM Andrei Lepikhov
> >>> The relid is also used to track the scans at executor level. Since we
> >>> have so many scans on A, each may be using different plan, we will
> >>> need different ids for those.
> >>
> >> I don't understand this sentence. Which way executor uses this index of
> >> RelOptInfo ?
> >
> > See Scan::scanrelid
> >
> Hi,
>
> In the attachment, you will find a fresh version of the patch.
> I've analysed the danger of the same RelOptInfo index for the executor.
> In the examples I found (scared), it is still not a problem because
> ExecQual() does all the jobs at one operation and doesn't intersect with
> over operations. Of course, it is not a good design, and we will work on
> this issue. But at least this code can be used in experiments.
> Furthermore, I've shared some reflections on this feature. To avoid
> cluttering the thread, I've published them in [1]. These thoughts
> provide additional context and considerations for our ongoing work.
>
> [1]
> https://danolivo.substack.com/p/postgresql-asymmetric-join-technique?r=34q1yy

I've rebased the patch to the current master.  Also, I didn't like the
needFlatCopy argument to reparameterize_path_by_child().  It looks
quite awkward.  Instead, as soon as we need to copy paths, I've
enabled native copy of paths.  Now, we can do just copyObject() over
path in caller.  Looks much cleaner for me.  What do you think?

Other notes:

1) I think we need to cover the cases, which
is_inner_rel_safe_for_asymmetric_join() filters out, by regression
tests.
2) is_asymmetric_join() looks awkward for me.  Should we instead make
a flag in JoinPath?
3) I understand that you have re-use RelOptInfo multiple times.  It's
too late stage of query processing to add a simple relation into
planner structs.  I tried rescans issued by cursors, EvalPlanQual()
caused by concurrent updates, but didn't manage to break this.  It
seems that even if same relation index is used multiple times in
different places of a query, it never get used simultaneously.  But
even if this somehow is OK, this is significant change of assumptions
in planner/executor data structures.  Perhaps, we need at least Tom's
opinion on this.

--
Regards,
Alexander Korotkov
Supabase




Casts from jsonb to other types should cope with json null

2024-08-01 Thread Tom Lane
I complained in the discussion of bug #18564 [1] that it's quite
inconsistent that you can cast a jsonb null to text and get
a SQL NULL:

=# select ('{"a": null}'::jsonb)->>'a';
 ?column? 
--
 
(1 row)

but if you cast it to any other type it's an error:

=# select (('{"a": null}'::jsonb)->'a')::float8;
ERROR:  cannot cast jsonb null to type double precision

I think this should be allowed and should produce a SQL NULL.
It doesn't look hard: the attached POC patch fixes this for
the float8 case only.  If there's not conceptual objections
I can flesh this out to cover the other jsonb-to-XXX
cast functions.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/18564-5985f90678ed7512%40postgresql.org

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 928552d551..91f1059e4c 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -2148,7 +2148,16 @@ jsonb_float8(PG_FUNCTION_ARGS)
 	JsonbValue	v;
 	Datum		retValue;
 
-	if (!JsonbExtractScalar(&in->root, &v) || v.type != jbvNumeric)
+	if (!JsonbExtractScalar(&in->root, &v))
+		cannotCastJsonbValue(v.type, "double precision");
+
+	if (v.type == jbvNull)
+	{
+		PG_FREE_IF_COPY(in, 0);
+		PG_RETURN_NULL();
+	}
+
+	if (v.type != jbvNumeric)
 		cannotCastJsonbValue(v.type, "double precision");
 
 	retValue = DirectFunctionCall1(numeric_float8,


Re: Why is citext/regress failing on hamerkop?

2024-08-01 Thread Thomas Munro
On Fri, Aug 2, 2024 at 1:37 AM Oleg Tselebrovskiy
 wrote:
> I would appreciate if you would backpatch this change to 15 and 16
> branches.

Done (e52a44b8, 91f498fd).

Any elucidation on how and why Windows machines have started using
UTF-8 would be welcome.




Re: Use pgBufferUsage for block reporting in analyze

2024-08-01 Thread Masahiko Sawada
On Wed, Jul 31, 2024 at 11:27 PM Anthonin Bonnefoy
 wrote:
>
> On Wed, Jul 31, 2024 at 9:36 PM Masahiko Sawada  wrote:
> > Meanwhile, I think we can push 0001 and 0002 patches since they are in
> > good shape. I've updated commit messages to them and slightly changed
> > 0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
> > of  "analyze of table \"%s.%s.%s\".
>
> Wouldn't it make sense to do the same for autoanalyze and write
> "finished automatic analyze of table \"%s.%s.%s\"\n" instead of
> "automatic analyze of table \"%s.%s.%s\"\n"?

I think that the current style is consistent with autovacuum logs:

2024-08-01 16:04:48.088 PDT [12302] LOG:  automatic vacuum of table
"postgres.public.test": index scans: 0
pages: 0 removed, 443 remain, 443 scanned (100.00% of total)
tuples: 0 removed, 10 remain, 0 are dead but not yet removable
removable cutoff: 751, which was 0 XIDs old when operation ended
new relfrozenxid: 739, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had
0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 1.466 MB/s
buffer usage: 905 hits, 0 reads, 4 dirtied
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2024-08-01 16:04:48.125 PDT [12302] LOG:  automatic analyze of table
"postgres.public.test"
avg read rate: 5.551 MB/s, avg write rate: 0.617 MB/s
buffer usage: 512 hits, 27 reads, 3 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s

Since ANALYZE command writes the start log, I think it makes sense to
write "finished" at the end of the operation:

=# analyze verbose test;
INFO:  analyzing "public.test"
INFO:  "test": scanned 443 of 443 pages, containing 10 live rows
and 0 dead rows; 3 rows in sample, 10 estimated total rows
INFO:  finished analyzing table "postgres.public.test"
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 549 hits, 0 reads, 0 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s
ANALYZE

>
> > Also, regarding 0003 patch, what is the main reason why we want to add
> > WAL usage to analyze reports? I think that analyze normally does not
> > write WAL records much so I'm not sure it's going to provide a good
> > insight for users.
>
> There was no strong reason except for consistency with VACUUM VERBOSE
> output. But as you said, it's not really providing valuable
> information so it's probably better to keep the noise down and drop
> it.

Okay. I think writing WAL usage would not be very noisy and probably
could help some cases where (auto)analyze unexpectedly writes many WAL
records (e.g., writing full page images much), and is consistent with
(auto)vacuum logs as you mentioned. So let's go with this direction
unless others think differently.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Casts from jsonb to other types should cope with json null

2024-08-01 Thread Maciek Sakrejda
On Thu, Aug 1, 2024 at 3:52 PM Tom Lane  wrote:
> I complained in the discussion of bug #18564 [1] that it's quite
> inconsistent that you can cast a jsonb null to text and get
> a SQL NULL:
>
> =# select ('{"a": null}'::jsonb)->>'a';
>  ?column?
> --
>
> (1 row)

Oddly, it looks like you only get a null if you use the '->>'
operator. With '->' and a subsequent cast to text, you get the string
"null":

maciek=# select (('{"a":null}'::jsonb)->'a')::text;
 text
--
 null
(1 row)

Is that expected?




Re: Casts from jsonb to other types should cope with json null

2024-08-01 Thread Tom Lane
Maciek Sakrejda  writes:
> Oddly, it looks like you only get a null if you use the '->>'
> operator. With '->' and a subsequent cast to text, you get the string
> "null":

> maciek=# select (('{"a":null}'::jsonb)->'a')::text;
>  text
> --
>  null
> (1 row)

> Is that expected?

I think what is happening there is you're getting the fallback
"cast via I/O" behavior.  There's no jsonb->text cast function
in the catalogs.

Perhaps it's worth adding one, so that it can be made to behave
similarly to the casts to other types.  However, that would be
a compatibility break for a case that doesn't fail today, so
it might be a harder sell than changing cases that do fail.
I'm mildly in favor of doing it, but somebody else might
think differently.

regards, tom lane




Re: can we mark upper/lower/textlike functions leakproof?

2024-08-01 Thread Robert Haas
On Thu, Aug 1, 2024 at 4:45 PM Jacob Champion
 wrote:
> Would it provide enough value for effort to explicitly mark leaky
> procedures as such? Maybe that could shrink the grey area enough to be
> protective?

You mean like proleakproof = true/false/maybe?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Query results vary depending on the plan cache mode used

2024-08-01 Thread Richard Guo
On Thu, Aug 1, 2024 at 10:34 PM Tom Lane  wrote:
> Richard Guo  writes:
> > While working on the grouping sets patches for queries with GROUP BY
> > items that are constants, I noticed $subject on master.  As an
> > example, consider
>
> This particular example seems like it's just an illustration of
> our known bugs with grouping sets using non-Var columns.  I see
> no reason to blame the plan cache for it.  Do you have any
> examples that don't depend on such a bug?

Yeah, it's not the fault of the plan cache.  I noticed this because in
check_ungrouped_columns, both Const and Param are treated as always
acceptable.  However, in setrefs.c these two expression types are
handled differently: Const is never matched to the lower tlist, while
Param is always attempted to be matched to the lower tlist.  This
discrepancy causes the query in the example to behave differently
depending on the plan cache mode.

I'm unsure which behavior is the expected one.


Another related question I have is about the behavior of the following
query:

select 3 as c1, 3 as c2 from generate_series(1,2) t group by
rollup(c1) having 3 = 3;

Should the target entry 'c2', as well as the Const expressions in
havingQual, be replaced with references to the grouping key?

Thanks
Richard




Re: PG buildfarm member cisticola

2024-08-01 Thread Thomas Munro
On Thu, Jul 25, 2024 at 8:38 PM Michael Paquier  wrote:
> On Wed, Jul 24, 2024 at 01:09:22PM +0200, Alvaro Herrera wrote:
> > Hmm, it appears that these symbols did not exist in 1.1.1g, and since
> > neither of them is invoked directly by the Postgres code, maybe the
> > reason for this is that the OpenSSL headers being used do not correspond
> > to the library being linked.

Yeah, the OpenSSL 3.x header ssl.h contains:

/* Deprecated in 3.0.0 */
#  ifndef OPENSSL_NO_DEPRECATED_3_0
#   define SSL_get_peer_certificate SSL_get1_peer_certificate
#  endif

PostgreSQL uses the name on the left, but OpenSSL 1.x library does not
contain the symbol on the right with the 1 in it.  Part of a strategy
for deprecating that API, but perhaps on that OS it is possible to
install both OpenSSL versions, and something extra is needed for the
header search path and library search path to agree?  I don't know
what Loongnix is exactly, but it has the aura of a RHEL-derivative.




wrong translation file reference in pg_createsubscriber

2024-08-01 Thread Kyotaro Horiguchi
Hi,

I found that pg_createsubscriber doesn't use NLS files. This is due to
the wrong reference name "pg_createsubscriber" being passed to
set_pglocale_pgservice(). It should be "pg_basebackup" instead. See
the attached patch.

# Sorry for being away for a while. I should be able to return soon.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ede9fecf6a042e87c5cf92ed34d6b1991646577b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 2 Aug 2024 11:33:52 +0900
Subject: [PATCH] Fix NLS file reference in pg_createsubscriber

pg_createsubscriber is referring to a non-existent message translation
file, causing NLS to not work correctly. This command should use the
same file as pg_basebackup.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index aab9bf0f51..4bce45e1e4 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -1904,7 +1904,7 @@ main(int argc, char **argv)
 	pg_logging_init(argv[0]);
 	pg_logging_set_level(PG_LOG_WARNING);
 	progname = get_progname(argv[0]);
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_createsubscriber"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_basebackup"));
 
 	if (argc > 1)
 	{
-- 
2.43.5



Re: On disable_cost

2024-08-01 Thread David Rowley
On Fri, 2 Aug 2024 at 06:03, Robert Haas  wrote:
> I think this may be a bit hard to understand, so let me give a
> concrete example. Suppose we're planning some join where one side can
> only be planned with a sequential scan and sequential scans are
> disabled. We have ten paths in the path list and they have costs of
> 1e10+100, 1e10+200, ..., 1e10+1000. Now add_path_precheck() is asked
> to consider a new path where there is a disabled node on BOTH sides of
> the join -- the one side has the disabled sequential scan, but now the
> other side also has something disabled, so the cost is let's say
> 2e10+79. add_path_precheck() can see at once that this path is a
> loser: it can't possibly dominate any path that already exists,
> because it costs more than any of them. But when you take disable_cost
> out, things look quite different. Now you have a proposed path with a
> total_cost of 79 and a path list with costs of 100, ..., 1000. If
> you're not allowed to know anything about disabled_nodes, the new path
> looks like it might be valuable. You might decide to construct it and
> try inserting into the pathlist, which will end up being useless, and
> even if you don't, you're going to compare its pathkeys and
> parameterization to each of the 10 existing paths before giving up.
> Bummer.

OK, so it sounds like you'd like to optimise this code so that the
planner does a little less work when node types are disabled. The
existing comment does mention explicitly that we don't want to do
that:

/*
* We could include disable_cost in the preliminary estimate, but that
* would amount to optimizing for the case where the join method is
* disabled, which doesn't seem like the way to bet.
*/

As far as I understand it from reading the comments in that file, I
see no offer of guarantees that the initial cost will be cheaper than
the final cost. So what you're proposing could end up rejecting paths
based on initial cost where the final cost might end up being the
cheapest path. Imagine you're considering a Nested Loop and a Hash
Join, both of which are disabled. Merge Join is unavailable as the
join column types are not sortable.  If the hash join costs 99 and the
initial nested loop costs 110, but the final nested loop ends up
costing 90, then the nested loop could be rejected before we even get
to perform the final cost for it. The current code will run
final_cost_nestloop() and find that 90 is cheaper than 99, whereas
what you want to do is stop bothering with nested loop when we see the
initial cost come out at 110.

Perhaps it's actually fine if the initial costs are always less than
the final costs as, if that's the case, we won't ever reject any paths
based on the initial cost that we wouldn't anyway based on the final
cost. However, since there does not seem to be any comments mentioning
this guarantee and if you're just doing this to squeeze more
performance out of the planner, it seems risky to do for that reason
alone.

I'd say if you want to do this, you should be justifying it on its own
merit with some performance numbers and some evidence that we don't
produce inferior plans as a result. But per what I quoted above,
you're not doing that, you're doing this as a performance
optimisation.

I'm not planning on pushing this any further. I've just tried to
highlight that there's the possibility of a behavioural change. You're
claiming there isn't one. I claim there is.

David




Re: rare crash - FailedAssertion snapbuild.c Line: 580

2024-08-01 Thread Alexander Lakhin

Hello Euler,

01.08.2024 21:09, Euler Taveira wrote:

According to this discussion, there isn't a clue about the root cause.
If you have a test case, share it (mainly if you are observing it in
version 16+ that exposes some data which may be useful for analysis).



Please take a look at [1], where I presented a reproducer for apparently
the same issue.

[1] 
https://www.postgresql.org/message-id/b91cf8ef-b5af-5def-ff05-bd67336ef907%40gmail.com

Best regards,
Alexander

Re: Query results vary depending on the plan cache mode used

2024-08-01 Thread Tom Lane
Richard Guo  writes:
> Yeah, it's not the fault of the plan cache.  I noticed this because in
> check_ungrouped_columns, both Const and Param are treated as always
> acceptable.  However, in setrefs.c these two expression types are
> handled differently: Const is never matched to the lower tlist, while
> Param is always attempted to be matched to the lower tlist.  This
> discrepancy causes the query in the example to behave differently
> depending on the plan cache mode.

> I'm unsure which behavior is the expected one.

I'm unsure why you think they need to have the same behavior.

This code is correct given the assumption that a Const is as constant
as it seems on the surface.  The trouble we're hitting is that in
the presence of grouping-set keys that reduce to Consts, a reference
to such a key is *not* constant.  The right answer to that IMO is to
not represent those references as plain Const nodes.

> Another related question I have is about the behavior of the following
> query:

> select 3 as c1, 3 as c2 from generate_series(1,2) t group by
> rollup(c1) having 3 = 3;

> Should the target entry 'c2', as well as the Const expressions in
> havingQual, be replaced with references to the grouping key?

This seems rather analogous to our occasional debates about whether
textually distinct uses of "random()" should refer to the same
value or different evaluations.  I can't get very excited about it,
because it seems to me this is an academic example not corresponding
to any real use-case.  The current code seems to effectively assume
that the constant-instances are distinct even though textually
identical, and that's fine with me.

regards, tom lane




Re: [Patch] remove duplicated smgrclose

2024-08-01 Thread Steven Niu
Hi, Junwang,

Thank you for the review and excellent summary in commit message!

This is my first contribution to community, and not so familiar with the
overall process.
After reading the process again, it looks like that I'm not qualified to
submit the patch to commitfest as I never had reviewed others' work.  :(
If so, could you please help to submit it to commitfest?

Best Regards,
Steven

Junwang Zhao  于2024年8月1日周四 20:32写道:

> Hi Steven,
>
> On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
> >
> > Hello, hackers,
> >
> > I think there may be some duplicated codes.
> > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and
> smgrclose().
> > But both functions would close SMgrRelation object, it's dupliacted
> behavior?
> >
> > So I make this patch. Could someone take a look at it?
> >
> > Thanks for your help,
> > Steven
> >
> > From Highgo.com
> >
> >
> You change LGTM, but the patch seems not to be applied to HEAD,
> I generate the attached v2 using `git format` with some commit message.
>
> --
> Regards
> Junwang Zhao
>


Casts from jsonb to other types should cope with json null

2024-08-01 Thread David G. Johnston
On Thursday, August 1, 2024, Tom Lane  wrote:

> Maciek Sakrejda  writes:
> > Oddly, it looks like you only get a null if you use the '->>'
> > operator. With '->' and a subsequent cast to text, you get the string
> > "null":
>
> > maciek=# select (('{"a":null}'::jsonb)->'a')::text;
> >  text
> > --
> >  null
> > (1 row)
>
> > Is that expected?
>
> I think what is happening there is you're getting the fallback
> "cast via I/O" behavior.  There's no jsonb->text cast function
> in the catalogs.
>
> Perhaps it's worth adding one, so that it can be made to behave
> similarly to the casts to other types.
>

I’m not too keen on opening Pandora’s box here even if I do regret our
current choices.  Semantic casting of json scalar strings only, and doing
document serialization as a function, would have been better in hindsight.

I am fine with implementing the conversion of json null types to SQL null
for all casts that already do semantic value casting, and thus recognize
but prohibit the cast, as shown for float.

I read the discussion thread [1] that added this and while one person
mentioned json null no one replied to that point and seemingly no explicit
consideration for treating json null semantically was ever done - i.e. this
fails only because in json null has its own type, and the test were type,
not value, oriented.  As SQL null is a value only, whose type is whatever
holds it, I’d argue our lack of doing this even constitutes a bug but
wouldn’t - and turning errors into non-errors has a lower “bug acceptance
threshold”.

David J.

[1]
https://www.postgresql.org/message-id/flat/0154d35a-24ae-f063-5273-9ffcdf1c7f2e%40postgrespro.ru


Re: [BUG?] check_exclusion_or_unique_constraint false negative

2024-08-01 Thread Amit Kapila
On Thu, Aug 1, 2024 at 2:55 PM Michail Nikolaev
 wrote:
>
> > Thanks for pointing out the issue!
>
> Thanks for your attention!
>
> > IIUC, the issue can happen when two concurrent transactions using 
> > DirtySnapshot access
> > the same tuples, which is not specific to the parallel apply
>
> Not exactly, it happens for any DirtySnapshot scan over a B-tree index with 
> some other transaction updating the same index page (even using the MVCC 
> snapshot).
>
> So, logical replication related scenario looks like this:
>
> * subscriber worker receives a tuple update\delete from the publisher
> * it calls RelationFindReplTupleByIndex to find the tuple in the local table
> * some other transaction updates the tuple in the local table (on subscriber 
> side) in parallel
> * RelationFindReplTupleByIndex may not find the tuple because it uses 
> DirtySnapshot
> * update\delete is lost
>
> Parallel apply mode looks like more dangerous because it uses multiple 
> workers on the subscriber side, so the probability of the issue is higher.
> In that case, "some other transaction" is just another worker applying 
> changes of different transaction in parallel.
>

I think it is rather less likely or not possible in a parallel apply
case because such conflicting updates (updates on the same tuple)
should be serialized at the publisher itself. So one of the updates
will be after the commit that has the second update.

I haven't tried the test based on your description of the general
problem with DirtySnapshot scan. In case of logical replication, we
will LOG update_missing type of conflict and the user may need to take
some manual action based on that. I have not tried a test so I could
be wrong as well. I am not sure we can do anything specific to logical
replication for this but feel free to suggest if you have ideas to
solve this problem in general or specific to logical replication.

-- 
With Regards,
Amit Kapila.




Re: [Patch] remove duplicated smgrclose

2024-08-01 Thread Junwang Zhao
Hi Steven,

On Fri, Aug 2, 2024 at 12:12 PM Steven Niu  wrote:
>
> Hi, Junwang,
>
> Thank you for the review and excellent summary in commit message!
>
> This is my first contribution to community, and not so familiar with the 
> overall process.
> After reading the process again, it looks like that I'm not qualified to 
> submit the patch to commitfest as I never had reviewed others' work.  :(
> If so, could you please help to submit it to commitfest?
>

https://commitfest.postgresql.org/49/5149/

I can not find your profile on commitfest so I left the author as empty,
have you ever registered? If you have a account, you can put your
name in the Authors list.

> Best Regards,
> Steven
>
> Junwang Zhao  于2024年8月1日周四 20:32写道:
>>
>> Hi Steven,
>>
>> On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
>> >
>> > Hello, hackers,
>> >
>> > I think there may be some duplicated codes.
>> > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and 
>> > smgrclose().
>> > But both functions would close SMgrRelation object, it's dupliacted 
>> > behavior?
>> >
>> > So I make this patch. Could someone take a look at it?
>> >
>> > Thanks for your help,
>> > Steven
>> >
>> > From Highgo.com
>> >
>> >
>> You change LGTM, but the patch seems not to be applied to HEAD,
>> I generate the attached v2 using `git format` with some commit message.
>>
>> --
>> Regards
>> Junwang Zhao



-- 
Regards
Junwang Zhao




Re: pg_dump: optimize dumpFunc()

2024-08-01 Thread Tom Lane
Nathan Bossart  writes:
> I've recently committed some optimizations for dumping sequences and
> pg_class information (commits 68e9629, bd15b7d, and 2329cad), and I noticed
> that we are also executing a query per function in pg_dump.  Commit be85727
> optimized this by preparing the query ahead of time, but I found that we
> can improve performance further by gathering all the relevant data in a
> single query.  Here are the results I see for a database with 10k simple
> functions with and without the attached patch:

I'm a bit concerned about this on two grounds:

1. Is it a win for DBs with not so many functions?

2. On the other end of the scale, if you've got a *boatload* of
functions, what does it do to pg_dump's memory requirements?
I'm recalling my days at Salesforce, where they had quite a few
thousand pl/pgsql functions totalling very many megabytes of source
text.  (Don't recall precise numbers offhand, and they'd be obsolete
by now even if I did.)

I'm not sure that the results you're showing justify taking any
risk here.

regards, tom lane




RE: [Proposal] Add foreign-server health checks infrastructure

2024-08-01 Thread Hayato Kuroda (Fujitsu)
Dear Fujii-san,

> Thanks for updating the patch!
> 
> > - Changed the name of new API from `GetUserMappingFromOid` to
> `GetUserMappingByOid`
> >to keep the name consistent with others.
> 
> If we expose this function as an FDW helper function, it should return
> a complete UserMapping object, including umoptions.
> 
> However, if postgres_fdw_get_connections() is the only user of this function,
> I'm not inclined to expose it as an FDW helper.

One reason is that the function does not handle any specific data for 
postgres_fdw,
however, there are no users and requirests from other projects. Based on that, 
ok,
we can move it to connection.c. If needed, we can export it again.

> Instead, we can either get
> the user ID by user mapping OID directly in connection.c using 
> SearchSysCache(),
> or add the user ID to ConnCacheEntry and use it in
> postgres_fdw_get_connections().
> Thought?

I moved the function to connection.c, which uses the SearchSysCache1().

I've tried both ways, and they worked well. One difference is that when we use
the extended ConnCacheEntry approach and the entry has been invalidated, we 
cannot
distinguish the reason. For example, in the below case, the entry is 
invalidated,
so the user_name of the output record will be NULL, whereas the user mapping is
actually still valid. We may be able to add the reason for invalidation, but
I'm not very motivated to modify the part.

```
BEGIN;
SELECT 1 FROM ft1 LIMIT 1; -- ft1 is at server "loopback"
...
ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
...
SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
-> {"loopback", NULL, } will be returned
```

Another reason is that we can keep the code consistent with the server case.
The part does not read data from the entry, and we can follow.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v2-0001-Extend-postgres_fdw_get_connections-to-return-use.patch
Description:  v2-0001-Extend-postgres_fdw_get_connections-to-return-use.patch


Re: why there is not VACUUM FULL CONCURRENTLY?

2024-08-01 Thread Antonin Houska
Kirill Reshke  wrote:

> What is the size of the biggest relation successfully vacuumed
> via pg_squeeze?
> Looks like in case of big relartion or high insertion load,
> replication may lag and never catch up...

Users reports problems rather than successes, so I don't know. 400 GB was
reported in [1] but it's possible that the table size for this test was
determined based on available disk space.

I think that the amount of data changes performed during the "squeezing"
matters more than the table size. In [2] one user reported "thounsands of
UPSERTs per second", but the amount of data also depends on row size, which he
didn't mention.

pg_squeeze gives up if it fails to catch up a few times. The first version of
my patch does not check this, I'll add the corresponding code in the next
version.

> However, in general, the 3rd patch is really big, very hard to
> comprehend.  Please consider splitting this into smaller (and
> reviewable) pieces.

I'll try to move some preparation steps into separate diffs, but not sure if
that will make the main diff much smaller. I prefer self-contained patches, as
also explained in [3].

> Also, we obviously need more tests on this. Both tap-test and
> regression tests I suppose.

Sure. The next version will use the injection points to test if "concurrent
data changes" are  processed correctly.

> One more thing is about pg_squeeze background workers. They act in an
> autovacuum-like fashion, aren't they? Maybe we can support this kind
> of relation processing in core too?

Maybe later. Even just adding the CONCURRENTLY option to CLUSTER and VACUUM
FULL requires quite some effort.


[1] https://github.com/cybertec-postgresql/pg_squeeze/issues/51

[2]
https://github.com/cybertec-postgresql/pg_squeeze/issues/21#issuecomment-514495369

[3] 
http://peter.eisentraut.org/blog/2024/05/14/when-to-split-patches-for-postgresql

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Statistics Import and Export

2024-08-01 Thread Jeff Davis
On Sat, 2024-07-27 at 21:08 -0400, Corey Huinker wrote:
> 
> > I don't like the idea of mixing statistics and control parameters
> > in
> > the same list.
> > 
> 
> 
> There's no way around it, at least now we need never worry about a
> confusing order for the parameters in the _restore_ functions because
> they can now be in any order you like.

Perhaps I was not precise enough when I said "control" parameters.
Mainly what I was worried about is trying to take parameters that
control things like transaction behavior (in-place vs mvcc), and
pg_dump should not be specifying that kind of thing. A parameter like
"version" is specified by pg_dump anyway, so it's probably fine the way
you've done it.

> SELECT pg_catalog.pg_set_attribute_stats(
>     relation => 'stats_import.test'::regclass::oid,
>     attname => 'arange'::name,
>     inherited => false::boolean,
>     null_frac => 0.5::real,
>     avg_width => 2::integer,
>     n_distinct => -0.1::real,
>     range_empty_frac => 0.5::real,
>     range_length_histogram => '{399,499,Infinity}'::text
>     );
>  pg_set_attribute_stats 
> 
>  
> (1 row)

I like it.

> and here is a restore function
> 
> -- warning: mcv cast failure
> SELECT *
> FROM pg_catalog.pg_restore_attribute_stats(
>     'relation', 'stats_import.test'::regclass::oid,
>     'attname', 'id'::name,
>     'inherited', false::boolean,
>     'version', 15::integer,
>     'null_frac', 0.5::real,
>     'avg_width', 2::integer,
>     'n_distinct', -0.4::real,
>     'most_common_vals', '{2,four,3}'::text,
>     'most_common_freqs', '{0.3,0.25,0.05}'::real[]
>     );
> WARNING:  invalid input syntax for type integer: "four"
>  row_written |          stats_applied           |          
>  stats_rejected            | params_rejected 
> -+--+
> --+-
>  t           | {null_frac,avg_width,n_distinct} |
> {most_common_vals,most_common_freqs} | 
> (1 row)

I think I like this, as well, except for the return value, which seems
like too much information and a bit over-engineered. Can we simplify it
to what's actually going to be used by pg_upgrade and other tools?

> Attached is v25.

I believe 0001 and 0002 are in good shape API-wise, and I can start
getting those committed. I will try to clean up the code in the
process.

Regards,
Jeff Davis





Re: Recovery of .partial WAL segments

2024-08-01 Thread Stefan Fercot
Hi,

I've added a CF entry for this patch:
https://commitfest.postgresql.org/49/5148/

Not sure why CFbot CI fails on macOS/Windows while it works with the Github
CI on my fork (
https://cirrus-ci.com/github/pgstef/postgres/partial-walseg-recovery).

Many thanks in advance for your feedback and thoughts about this patch,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

On Thu, Aug 1, 2024 at 10:23 PM Stefan Fercot 
wrote:

> Dear hackers,
>
> Generating a ".partial" WAL segment is pretty common nowadays (using
> pg_receivewal or during standby promotion).
> However, we currently don't do anything with it unless the user manually
> removes that ".partial" extension.
>
> The 028_pitr_timelines tests are highlighting that fact: with test data
> being being in 00020003 and
> 00010003.partial, a recovery following the latest timeline
> (2) will succeed but fail if we follow the current timeline (1).
>
> By simply trying to fetch the ".partial" file in XLogFileRead, we can
> easily recover more data and also cover that (current timeline) recovery
> case.
>
> So, this proposed patch makes XLogFileRead try to restore ".partial" WAL
> archives and adds a test to 028_pitr_timelines using current
> recovery_target_timeline.
>
> As far as I've seen, the current pg_receivewal tests only seem to cover
> the archives generation but not actually trying to recover using it. I
> wasn't sure it was interesting to add such tests right now, so I didn't
> considered it for this patch.
>
> Many thanks in advance for your feedback and thoughts about this,
> Kind Regards,
> --
> Stefan FERCOT
> Data Egret (https://dataegret.com)