Re: RFC: Additional Directory for Extensions

2024-04-03 Thread Alvaro Herrera
On 2024-Apr-02, David E. Wheeler wrote:

> That quotation comes from this Debian patch[2] maintained by Christoph
> Berg. I’d like to formally propose integrating this patch into the
> core. And not only because it’s overhead for package maintainers like
> Christoph, but because a number of use cases have emerged since we
> originally discussed something like this back in 2013[3]:

I support the idea of there being a second location from where to load
shared libraries ... but I don't like the idea of making it
runtime-configurable.  If we want to continue to tighten up what
superuser can do, then one of the things that has to go away is the
ability to load shared libraries from arbitrary locations
(dynamic_library_path).  I think we should instead look at making those
locations hardcoded at compile time.  The packager can then decide where
those things go, and the superuser no longer has the ability to load
arbitrary code from arbitrary locations.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".




Re: remaining sql/json patches

2024-04-03 Thread jian he
On Wed, Apr 3, 2024 at 11:30 AM jian he  wrote:
>
> On Tue, Apr 2, 2024 at 9:57 PM Amit Langote  wrote:
> >
> > Please let me know if you have further comments on 0001.  I'd like to
> > get that in before spending more energy on 0002.
> >

-- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -2019,6 +2019,9 @@ FigureColnameInternal(Node *node, char **name)
  case JSON_VALUE_OP:
  *name = "json_value";
  return 2;
+ case JSON_TABLE_OP:
+ *name = "json_table";
+ return 2;
  default:
  elog(ERROR, "unrecognized JsonExpr op: %d",
  (int) ((JsonFuncExpr *) node)->op);

"case JSON_TABLE_OP part", no need?
json_table output must provide column name and type?

I did some minor refactor transformJsonTableColumns, make the comments
align with the function intention.
in v48-0001, in transformJsonTableColumns we can `Assert(rawc->name);`.
since non-nested JsonTableColumn must specify column name.
in v48-0002, we can change to `if (rawc->coltype != JTC_NESTED)
Assert(rawc->name);`



SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS (a int PATH '$.a' )
ERROR ON ERROR) jt;
ERROR:  no SQL/JSON item

I thought it should just return NULL.
In this case, I thought that
(not column-level) ERROR ON ERROR should not interfere with "COLUMNS
(a int PATH '$.a' )".

+-- Other miscellanous checks
"miscellanous" should be "miscellaneous".


overall the coverage is pretty high.
the current test output looks fine.


v48-0001-minor-refactor-transformJsonTableColumns.only_for_v48_0001
Description: Binary data


Re: RFC: Additional Directory for Extensions

2024-04-03 Thread walther

Alvaro Herrera:

I support the idea of there being a second location from where to load
shared libraries ... but I don't like the idea of making it
runtime-configurable.  If we want to continue to tighten up what
superuser can do, then one of the things that has to go away is the
ability to load shared libraries from arbitrary locations
(dynamic_library_path).  I think we should instead look at making those
locations hardcoded at compile time.  The packager can then decide where
those things go, and the superuser no longer has the ability to load
arbitrary code from arbitrary locations.


The use-case for runtime configuration of this seems to be build-time 
testing of extensions against an already installed server. For this 
purpose it should be enough to be able to set this directory at startup 
- it doesn't need to be changed while the server is actually running. 
Then you could spin up a temporary postgres instance with the extension 
directory pointing a the build directory and test.


Would startup-configurable be better than runtime-configurable regarding 
your concerns?


I can also imagine that it would be very helpful in a container setup to 
be able to set an environment variable with this path instead of having 
to recompile all of postgres to change it.


Best,

Wolfgang




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Daniel Gustafsson
Buildfarm animal mamba (NetBSD 10.0 on macppc) started failing on this with
what seems like a bogus compiler warning:

waitlsn.c: In function 'WaitForLSN':
waitlsn.c:275:24: error: 'endtime' may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
  275 |delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
  |   ~^~~~
cc1: all warnings being treated as errors

endtime is indeed initialized further up, but initializing endtime at
declaration seems innocent enough and should make this compiler happy and the
buildfarm greener.

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 6679378156..17ad0057ad 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -226,7 +226,7 @@ void
 WaitForLSN(XLogRecPtr targetLSN, int64 timeout)
 {
XLogRecPtr  currentLSN;
-   TimestampTz endtime;
+   TimestampTz endtime = 0;

/* Shouldn't be called when shmem isn't initialized */
Assert(waitLSN);

--
Daniel Gustafsson





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Bertrand Drouvot
Hi,

On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 8:38 AM shveta malik  wrote:
> >
> > > > Or a simple solution is that the slotsync worker updates
> > > > inactive_since as it does for non-synced slots, and disables
> > > > timeout-based slot invalidation for synced slots.
> >
> > I like this idea better, it takes care of such a case too when the
> > user is relying on sync-function rather than worker and does not want
> > to get the slots invalidated in between 2 sync function calls.
> 
> Please find the attached v31 patches implementing the above idea:

Thanks!

Some comments regarding v31-0002:

=== testing the behavior

T1 ===

> - synced slots don't get invalidated due to inactive timeout because
> such slots not considered active at all as they don't perform logical
> decoding (of course, they will perform in fast_forward mode to fix the
> other data loss issue, but they don't generate changes for them to be
> called as *active* slots)

It behaves as described. OTOH non synced logical slots on the standby and
physical slots on the standby are invalidated which is what is expected.

T2 ===

In case the slot is invalidated on the primary,

primary:

postgres=# select slot_name, inactive_since, invalidation_reason from 
pg_replication_slots where slot_name = 's1';
 slot_name |inactive_since | invalidation_reason
---+---+-
 s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout

then on the standby we get:

standby:

postgres=# select slot_name, inactive_since, invalidation_reason from 
pg_replication_slots where slot_name = 's1';
 slot_name |inactive_since| invalidation_reason
---+--+-
 s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout

shouldn't the slot be dropped/recreated instead of updating inactive_since?

=== code

CR1 ===

+Invalidates replication slots that are inactive for longer the
+specified amount of time

s/for longer the/for longer that/?

CR2 ===

+true) as such synced slots don't actually perform
+logical decoding.

We're switching in fast forward logical due to [1], so I'm not sure that's 100%
accurate here. I'm not sure we need to specify a reason.

CR3 ===

+ errdetail("This slot has been invalidated because it was inactive for more 
than the time specified by replication_slot_inactive_timeout parameter.")));

I think we can remove "parameter" (see for example the error message in
validate_remote_info()) and reduce it a bit, something like?

"This slot has been invalidated because it was inactive for more than 
replication_slot_inactive_timeout"?

CR4 ===

+ appendStringInfoString(&err_detail, _("The slot has been inactive for more 
than the time specified by replication_slot_inactive_timeout parameter."));

Same.

CR5 ===

+   /*
+* This function isn't expected to be called for inactive timeout based
+* invalidation. A separate function InvalidateInactiveReplicationSlot 
is
+* to be used for that.

Do you think it's worth to explain why?

CR6 ===

+   if (replication_slot_inactive_timeout == 0)
+   return false;
+   else if (slot->inactive_since > 0)

"else" is not needed here.

CR7 ===

+   SpinLockAcquire(&slot->mutex);
+
+   /*
+* Check if the slot needs to be invalidated due to
+* replication_slot_inactive_timeout GUC. We do this with the 
spinlock
+* held to avoid race conditions -- for example the 
inactive_since
+* could change, or the slot could be dropped.
+*/
+   now = GetCurrentTimestamp();

We should not call GetCurrentTimestamp() while holding a spinlock.

CR8 ===

+# Testcase start: Invalidate streaming standby's slot as well as logical
+# failover slot on primary due to inactive timeout GUC. Also, check the logical

s/inactive timeout GUC/replication_slot_inactive_timeout/?

CR9 ===

+# Start: Helper functions used for this test file
+# End: Helper functions used for this test file

I think that's the first TAP test with this comment. Not saying we should not 
but
why did you feel the need to add those?

[1]: 
https://www.postgresql.org/message-id/os0pr01mb5716b3942ae49f3f725aca9294...@os0pr01mb5716.jpnprd01.prod.outlook.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: RFC: Additional Directory for Extensions

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 09:13, Alvaro Herrera  wrote:
> 
> On 2024-Apr-02, David E. Wheeler wrote:
> 
>> That quotation comes from this Debian patch[2] maintained by Christoph
>> Berg. I’d like to formally propose integrating this patch into the
>> core. And not only because it’s overhead for package maintainers like
>> Christoph, but because a number of use cases have emerged since we
>> originally discussed something like this back in 2013[3]:
> 
> I support the idea of there being a second location from where to load
> shared libraries

Agreed, the case made upthread that installing an extension breaks the app
signing seems like a compelling reason to do this.

The implementation of this need to make sure the directory is properly set up
however to avoid similar problems that CVE 2019-10211 showed.

--
Daniel Gustafsson





Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Andy Fan


> I also can confirm that a lot of users would be very happy to have
> "read your writes consistency" and ready to do something to achieve
> this at an application level.  However, they typically don't know what
> exactly they need.
>
> So, blogging about pg_wal_replay_wait() and spreading words about it
> at conferences would be highly appreciated.

Sure, once it is committed, I promise I can doing a knowledge sharing in
our organization and write a article in chinese language.  

-- 
Best Regards
Andy Fan





Re: Use streaming read API in ANALYZE

2024-04-03 Thread Jakub Wartak
On Tue, Apr 2, 2024 at 9:24 AM Nazir Bilal Yavuz  wrote:
[..]
> v4 is rebased on top of v14 streaming read API changes.

Hi Nazir, so with streaming API committed, I gave a try to this patch.
With autovacuum=off and 30GB table on NVMe (with standard readahead of
256kb and ext4, Debian 12, kernel 6.1.0, shared_buffers = 128MB
default) created using: create table t as select repeat('a', 100) || i
|| repeat('b', 500) as filler from generate_series(1, 4500) as i;

on master, effect of mainteance_io_concurency [default 10] is like
that (when resetting the fs cache after each ANALYZE):

m_io_c = 0:
Time: 3137.914 ms (00:03.138)
Time: 3094.540 ms (00:03.095)
Time: 3452.513 ms (00:03.453)

m_io_c = 1:
Time: 2972.751 ms (00:02.973)
Time: 2939.551 ms (00:02.940)
Time: 2904.428 ms (00:02.904)

m_io_c = 2:
Time: 1580.260 ms (00:01.580)
Time: 1572.132 ms (00:01.572)
Time: 1558.334 ms (00:01.558)

m_io_c = 4:
Time: 938.304 ms
Time: 931.772 ms
Time: 920.044 ms

m_io_c = 8:
Time: 666.025 ms
Time: 660.241 ms
Time: 648.848 ms

m_io_c = 16:
Time: 542.450 ms
Time: 561.155 ms
Time: 539.683 ms

m_io_c = 32:
Time: 538.487 ms
Time: 541.705 ms
Time: 538.101 ms

with patch applied:

m_io_c = 0:
Time: 3106.469 ms (00:03.106)
Time: 3140.343 ms (00:03.140)
Time: 3044.133 ms (00:03.044)

m_io_c = 1:
Time: 2959.817 ms (00:02.960)
Time: 2920.265 ms (00:02.920)
Time: 2911.745 ms (00:02.912)

m_io_c = 2:
Time: 1581.912 ms (00:01.582)
Time: 1561.444 ms (00:01.561)
Time: 1558.251 ms (00:01.558)

m_io_c = 4:
Time: 908.116 ms
Time: 901.245 ms
Time: 901.071 ms

m_io_c = 8:
Time: 619.870 ms
Time: 620.327 ms
Time: 614.266 ms

m_io_c = 16:
Time: 529.885 ms
Time: 526.958 ms
Time: 528.474 ms

m_io_c = 32:
Time: 521.185 ms
Time: 520.713 ms
Time: 517.729 ms

No difference to me, which seems to be good. I've double checked and
patch used the new way

acquire_sample_rows -> heapam_scan_analyze_next_block ->
ReadBufferExtended -> ReadBuffer_common (inlined) -> WaitReadBuffers
-> mdreadv -> FileReadV -> pg_preadv (inlined)
acquire_sample_rows -> heapam_scan_analyze_next_block ->
ReadBufferExtended -> ReadBuffer_common (inlined) -> StartReadBuffer
-> ...

I gave also io_combine_limit to 32 (max, 256kB) a try and got those
slightly better results:

[..]
m_io_c = 16:
Time: 494.599 ms
Time: 496.345 ms
Time: 973.500 ms

m_io_c = 32:
Time: 461.031 ms
Time: 449.037 ms
Time: 443.375 ms

and that (last one) apparently was able to push it to ~50-60k still
random IOPS range, the rareq-sz was still ~8 (9.9) kB as analyze was
still reading random , so I assume no merging was done:

Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz
w/s wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s
drqm/s  %drqm d_await dareq-sz f/s f_await  aqu-sz  %util
nvme0n1   61212.00591.82 0.00   0.000.10 9.90
2.00  0.02 0.00   0.000.0012.000.00  0.00
0.00   0.000.00 0.000.000.006.28  85.20

So in short it looks good to me.

-Jakub Wartak.




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alexander Korotkov
Hi, Alvaro!

Thank you for your feedback.

On Wed, Apr 3, 2024 at 9:58 AM Alvaro Herrera  wrote:
> Hello, I noticed that commit 06c418e163e9 uses waitLSN->mutex (a
> spinlock) to protect the contents of waitLSN -- and it's used to walk an
> arbitrary long list of processes waiting ... and also, an arbitrary
> number of processes could be calling this code.  I think using a
> spinlock for this is unwise, as it'll cause busy-waiting whenever
> there's contention.  Wouldn't it be better to use an LWLock for this?
> Then the processes would sleep until the lock is freed.
>
> While nosing about the code, other things struck me:
>
> I think there should be more comments about WaitLSNProcInfo and
> WaitLSNState in waitlsn.h.
>
> In addLSNWaiter it'd be better to assign 'cur' before acquiring the
> lock.
>
> Is a plan array really the most efficient data structure for this,
> considering that you have to reorder each time you add an element?
> Maybe it is, but wouldn't it make sense to use memmove() when adding one
> element rather iterating all the remaining elements to the end of the
> queue?
>
> I think the include list in waitlsn.c could be tightened a bit:

I've just pushed commit, which shortens the include list and fixes the
order of 'cur' assigning and taking spinlock in addLSNWaiter().

Regarding the shmem data structure for LSN waiters.  I didn't pick
LWLock or ConditionVariable, because I needed the ability to wake up
only those waiters whose LSN is already replayed.  In my experience
waking up a process is way slower than scanning a short flat array.

However, I agree that when the number of waiters is very high and flat
array may become a problem.  It seems that the pairing heap is not
hard to use for shmem structures.  The only memory allocation call in
paritingheap.c is in pairingheap_allocate().  So, it's only needed to
be able to initialize the pairing heap in-place, and it will be fine
for shmem.

I'll come back with switching to the pairing heap shortly.

--
Regards,
Alexander Korotkov




Re: Improve eviction algorithm in ReorderBuffer

2024-04-03 Thread Jeff Davis
On Mon, 2024-04-01 at 12:42 +0900, Masahiko Sawada wrote:
> While reviewing the patches, I realized the comment of
> binearyheap_allocate() should also be updated. So I've attached the
> new patches.

In sift_{up|down}, each loop iteration calls set_node(), and each call
to set_node does a hash lookup. I didn't measure it, but that feels
wasteful.

I don't even think you really need the hash table. The key to the hash
table is a pointer, so it's not really doing anything that couldn't be
done more efficiently by just following the pointer.

I suggest that you add a "heap_index" field to ReorderBufferTXN that
would point to the index into the heap's array (the same as
bh_nodeidx_entry.index in your patch). Each time an element moves
within the heap array, just follow the pointer to the ReorderBufferTXN
object and update the heap_index -- no hash lookup required.

That's not easy to do with the current binaryheap API. But a binary
heap is not a terribly complex structure, so you can just do an inline
implementation of it where sift_{up|down} know to update the heap_index
field of the ReorderBufferTXN.

Regards,
Jeff Davis





Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2024-04-03 Thread Jakub Wartak
Hi Andrey,

On Thu, Mar 28, 2024 at 1:09 PM Andrey M. Borodin  wrote:
>
>
>
> > On 8 Aug 2023, at 12:31, John Naylor  wrote:
> >
> > > > Also the shared counter is the cause of the slowdown, but not the 
> > > > reason for the numeric limit.
> > >
> > > Isn't it both? typedef Oid is unsigned int = 2^32, and according to 
> > > GetNewOidWithIndex() logic if we exhaust the whole OID space it will hang 
> > > indefinitely which has the same semantics as "being impossible"/permanent 
> > > hang (?)
> >
> > Looking again, I'm thinking the OID type size is more relevant for the 
> > first paragraph, and the shared/global aspect is more relevant for the 
> > second.
> >
> > The last issue is how to separate the notes at the bottom, since there are 
> > now two topics.
>
> Jakub, do you have plans to address this feedback? Is the CF entry still 
> relevant?

Yes; I've forgotten about this one and clearly I had problems
formulating it in proper shape to be accepted. I've moved it to the
next CF now as this is not critical and I would prefer to help current
timesenistive CF. Anyone is welcome to help amend the patch...

-J.




Re: New Table Access Methods for Multi and Single Inserts

2024-04-03 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis  wrote:
>
> On Sun, 2024-03-31 at 21:18 +0530, Bharath Rupireddy wrote:
> > if (table_modify_buffer_insert() is defined)
> >table_modify_buffer_insert(...);
> > else
> > {
> >   myState->bistate = GetBulkInsertState();
> >   table_tuple_insert(...);
> > }
>
> We can't alloc/free the bulk insert state for every insert call. I see
> two options:
>
> * Each caller needs to support two code paths: if the buffered insert
> APIs are defined, then use those; otherwise the caller needs to manage
> the bulk insert state itself and call the plain insert API.
>
> * Have default implementation for the new API methods, so that the
> default for the begin method would allocate the bulk insert state, and
> the default for the buffered insert method would be to call plain
> insert using the bulk insert state.
>
> I'd prefer the latter, at least in the long term. But I haven't really
> thought through the details, so perhaps we'd need to use the former.

I too prefer the latter so that the caller doesn't have to have two
paths. The new API can just transparently fallback to single inserts.
I've implemented that in the attached v17 patch. I also tested the
default APIs manually, but I'll see if I can add some tests to it the
default API.

> > > After we have these new APIs fully in place and used by COPY, what
> > > will
> > > happen to those other APIs? Will they be deprecated or will there
> > > be a
> > > reason to keep them?
> >
> > Deprecated perhaps?
>
> Including Alexander on this thread, because he's making changes to the
> multi-insert API. We need some consensus on where we are going with
> these APIs before we make more changes, and what incremental steps make
> sense in v17.
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.
>
> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)
>
> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.
>
> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.
>
> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.
>
> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).
>
> 10. Use these new methods for logical apply.
>
> 11. Deprecate the multi_insert API.
>
> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

I'd like to see the new multi insert API (as proposed in the v17
patches) for PG17 if possible. The basic idea with these new APIs is
to let the AM implementers choose the right buffered insert strategy
(one can choose the AM specific slot type to buffer the tuples, choose
the AM specific memory and flushing decisions etc.). Another advantage
with these new multi insert API is that the CREATE MATERIALIZED VIEW,
REFRESH MATERIALIZED VIEW, CREATE TABLE AS commands for heap AM got
faster by 62.54%, 68.87%, 74.31% or 2.67, 3.21, 3.89 times
respectively. The performance improvement in REFRESH MATERIALIZED VIEW
can benefit customers running analytical w

Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

2024-04-03 Thread Daniel Gustafsson
> On 6 Mar 2024, at 09:59, Daniel Gustafsson  wrote:

> Nothing sticks out from reading through these patches, they seem quite ready 
> to
> me.

Took another look at this today and committed it. Thanks!

--
Daniel Gustafsson





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread shveta malik
On Wed, Apr 3, 2024 at 11:17 AM Bharath Rupireddy
 wrote:
>
> On Wed, Apr 3, 2024 at 8:38 AM shveta malik  wrote:
> >
> > > > Or a simple solution is that the slotsync worker updates
> > > > inactive_since as it does for non-synced slots, and disables
> > > > timeout-based slot invalidation for synced slots.
> >
> > I like this idea better, it takes care of such a case too when the
> > user is relying on sync-function rather than worker and does not want
> > to get the slots invalidated in between 2 sync function calls.
>
> Please find the attached v31 patches implementing the above idea:
>

Thanks for the patches, please find few comments:

v31-001:

1)
system-views.sgml:
value will get updated  after every synchronization from the
corresponding remote slot on the primary.

--This is confusing. It will be good to rephrase it.

2)
update_synced_slots_inactive_since()

--May be, we should mention in the header that this function is called
only during promotion.

3) 040_standby_failover_slots_sync.pl:
We capture inactive_since_on_primary when we do this for the first time at #175
ALTER SUBSCRIPTION regress_mysub1 DISABLE"

But we again recreate the sub and disable it at line #280.
Do you think we shall get inactive_since_on_primary again here, to be
compared with inactive_since_on_new_primary later?


v31-002:
(I had reviewed v29-002 but missed to post comments,  I think these
are still applicable)

1) I think replication_slot_inactivity_timeout was recommended here
(instead of replication_slot_inactive_timeout, so please give it a
thought):
https://www.postgresql.org/message-id/202403260739.udlp7lxixktx%40alvherre.pgsql

2) Commit msg:
a)
"It is often easy for developers to set a timeout of say 1
or 2 or 3 days at slot level, after which the inactive slots get
dropped."

Shall we say invalidated rather than dropped?

b)
"To achieve the above, postgres introduces a GUC allowing users
set inactive timeout and then a slot stays inactive for this much
amount of time it invalidates the slot."

Broken sentence.



thanks
Shveta




Re: remaining sql/json patches

2024-04-03 Thread jian he
On Wed, Apr 3, 2024 at 3:15 PM jian he  wrote:
>
> On Wed, Apr 3, 2024 at 11:30 AM jian he  wrote:
> >
> > On Tue, Apr 2, 2024 at 9:57 PM Amit Langote  wrote:
> > >
> > > Please let me know if you have further comments on 0001.  I'd like to
> > > get that in before spending more energy on 0002.
> > >

more doc issue with v48. 0001, 0002.

 The optional json_path_name serves as an
 identifier of the provided path_expression.
 The path name must be unique and distinct from the column names.

"path name" should be
json_path_name


git diff --check
doc/src/sgml/func.sgml:19192: trailing whitespace.
+ id |   kind   |  title  | director


+  
+   JSON data stored at a nested level of the row pattern can be extracted using
+   the NESTED PATH clause.  Each
+   NESTED PATH clause can be used to generate one or more
+   columns using the data from a nested level of the row pattern, which can be
+   specified using a COLUMNS clause.  Rows constructed from
+   such columns are called child rows and are joined
+   agaist the row constructed from the columns specified in the parent
+   COLUMNS clause to get the row in the final view.  Child
+   columns may themselves contain a NESTED PATH
+   specifification thus allowing to extract data located at arbitrary nesting
+   levels.  Columns produced by NESTED PATHs at the same
+   level are considered to be siblings and are joined
+   with each other before joining to the parent row.
+  

"agaist" should be "against".
"specifification" should be "specification".
+Rows constructed from
+   such columns are called child rows and are joined
+   agaist the row constructed from the columns specified in the parent
+   COLUMNS clause to get the row in the final view.
this sentence is long, not easy to comprehend, maybe we can rephrase it
or split it into two.



+  | NESTED PATH json_path_specification
 AS path_name 
+COLUMNS ( json_table_column
, ... )
v48, 0002 patch.
in the json_table synopsis section, put these two lines into one line,
I think would make it more readable.
also the following sgml code will render the html into one line.

  NESTED PATH
json_path_specification 
AS json_path_name

  COLUMNS (
json_table_column ,
... )


also path_name should be
json_path_name.



+
+ The NESTED PATH syntax is recursive,
+ so you can go down multiple nested levels by specifying several
+ NESTED PATH subclauses within each other.
+ It allows to unnest the hierarchy of JSON objects and arrays
+ in a single function invocation rather than chaining several
+ JSON_TABLE expressions in an SQL statement.
+
"The NESTED PATH syntax is recursive"
should be
`
The NESTED PATH syntax can be recursive,
you can go down multiple nested levels by specifying several
NESTED PATH subclauses within each other.
`




RE: Psql meta-command conninfo+

2024-04-03 Thread Maiquel Grassi
Hello Sami,

(v26)

>  Here is an example [1] where the session information functions are 
> referenced.

>  The public doc for this example is in [2].


>  Something similar can be done here. For example:


>  The session user's name; see the session_user() 
> function in

>   for more details.

I updated the patch.

Thank you for this help.

Regards,
Maiquel Grassi.


v26-0001-psql-meta-command-conninfo-plus.patch
Description: v26-0001-psql-meta-command-conninfo-plus.patch


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Amit Kapila
On Wed, Apr 3, 2024 at 12:20 PM Bertrand Drouvot
 wrote:
>
> On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> > On Wed, Apr 3, 2024 at 8:38 AM shveta malik  wrote:
> > >
> > > > > Or a simple solution is that the slotsync worker updates
> > > > > inactive_since as it does for non-synced slots, and disables
> > > > > timeout-based slot invalidation for synced slots.
> > >
> > > I like this idea better, it takes care of such a case too when the
> > > user is relying on sync-function rather than worker and does not want
> > > to get the slots invalidated in between 2 sync function calls.
> >
> > Please find the attached v31 patches implementing the above idea:
>
> Thanks!
>
> Some comments related to v31-0001:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots get their on inactive_since just like any other slot
>
> It behaves as described.
>
> T2 ===
>
> > - synced slots inactive_since is set to current timestamp after the
> > standby gets promoted to help inactive_since interpret correctly just
> > like any other slot.
>
> It behaves as described.
>
> CR1 ===
>
> +inactive_since value will get updated
> +after every synchronization
>
> indicates the last synchronization time? (I think that after every 
> synchronization
> could lead to confusion).
>

+1.

> CR2 ===
>
> +   /*
> +* Set the time since the slot has become inactive 
> after shutting
> +* down slot sync machinery. This helps correctly 
> interpret the
> +* time if the standby gets promoted without a 
> restart.
> +*/
>
> It looks to me that this comment is not at the right place because there is
> nothing after the comment that indicates that we shutdown the "slot sync 
> machinery".
>
> Maybe a better place is before the function definition and mention that this 
> is
> currently called when we shutdown the "slot sync machinery"?
>

Won't it be better to have an assert for SlotSyncCtx->pid? IIRC, we
have some existing issues where we don't ensure that no one is running
sync API before shutdown is complete but I think we can deal with that
separately and here we can still have an Assert.

> CR3 ===
>
> +* We get the current time beforehand and only once 
> to avoid
> +* system calls overhead while holding the lock.
>
> s/avoid system calls overhead while holding the lock/avoid system calls while 
> holding the spinlock/?
>

Is it valid to say that there is overhead of this call while holding
spinlock? Because I don't think at the time of promotion we expect any
other concurrent slot activity. The first reason seems good enough.

One other observation:
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -42,6 +42,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogutils.h"

Is there a reason for this inclusion? I don't see any change which
should need this one.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-04-03 Thread Amit Kapila
On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila  wrote:
>
> On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
>  wrote:
>
> > I quickly looked at v8, and have a nit, rest all looks good.
> >
> > +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > +*found_consistent_snapshot = true;
> >
> > Can the found_consistent_snapshot be checked first to help avoid the
> > function call DecodingContextReady() for pg_replication_slot_advance
> > callers?
> >
>
> Okay, changed. Additionally, I have updated the comments and commit
> message. I'll push this patch after some more testing.
>

Pushed!

-- 
With Regards,
Amit Kapila.




Re: Use streaming read API in ANALYZE

2024-04-03 Thread Nazir Bilal Yavuz
Hi,

On Tue, 2 Apr 2024 at 10:23, Nazir Bilal Yavuz  wrote:
>
> v4 is rebased on top of v14 streaming read API changes.

Streaming API has been committed but the committed version has a minor
change, the read_stream_begin_relation function takes Relation instead
of BufferManagerRelation now. So, here is a v5 which addresses this
change.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 43e2f2b32e2fdb7e1fd787b1d8595768741f4792 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Wed, 3 Apr 2024 01:22:59 +0300
Subject: [PATCH v5] Use streaming read API in ANALYZE

ANALYZE command gets random tuples using BlockSampler algorithm. Use
streaming reads to get these tuples by using BlockSampler algorithm in
streaming read API prefetch logic.
---
 src/include/access/heapam.h  |  4 +-
 src/backend/access/heap/heapam_handler.c | 16 ++---
 src/backend/commands/analyze.c   | 85 
 3 files changed, 36 insertions(+), 69 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c4..4e35caeb42e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -25,6 +25,7 @@
 #include "storage/bufpage.h"
 #include "storage/dsm.h"
 #include "storage/lockdefs.h"
+#include "storage/read_stream.h"
 #include "storage/shm_toc.h"
 #include "utils/relcache.h"
 #include "utils/snapshot.h"
@@ -374,8 +375,7 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 
 /* in heap/heapam_handler.c*/
 extern void heapam_scan_analyze_next_block(TableScanDesc scan,
-		   BlockNumber blockno,
-		   BufferAccessStrategy bstrategy);
+		   ReadStream *stream);
 extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan,
 		   TransactionId OldestXmin,
 		   double *liverows, double *deadrows,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245b..0533d9660c4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1054,16 +1054,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 }
 
 /*
- * Prepare to analyze block `blockno` of `scan`.  The scan has been started
- * with SO_TYPE_ANALYZE option.
+ * Prepare to analyze block returned from streaming object.  The scan has been
+ * started with SO_TYPE_ANALYZE option.
  *
  * This routine holds a buffer pin and lock on the heap page.  They are held
  * until heapam_scan_analyze_next_tuple() returns false.  That is until all the
  * items of the heap page are analyzed.
  */
 void
-heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno,
-			   BufferAccessStrategy bstrategy)
+heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
 {
 	HeapScanDesc hscan = (HeapScanDesc) scan;
 
@@ -1076,11 +1075,12 @@ heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno,
 	 * doing much work per tuple, the extra lock traffic is probably better
 	 * avoided.
 	 */
-	hscan->rs_cblock = blockno;
-	hscan->rs_cindex = FirstOffsetNumber;
-	hscan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM,
-		blockno, RBM_NORMAL, bstrategy);
+	hscan->rs_cbuf = read_stream_next_buffer(stream, NULL);
+	Assert(BufferIsValid(hscan->rs_cbuf));
 	LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+	hscan->rs_cblock = BufferGetBlockNumber(hscan->rs_cbuf);
+	hscan->rs_cindex = FirstOffsetNumber;
 }
 
 /*
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 2fb39f3ede1..105285c3ea2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1102,6 +1102,20 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	return stats;
 }
 
+/*
+ * Prefetch callback function to get next block number while using
+ * BlockSampling algorithm
+ */
+static BlockNumber
+block_sampling_streaming_read_next(ReadStream *stream,
+   void *user_data,
+   void *per_buffer_data)
+{
+	BlockSamplerData *bs = user_data;
+
+	return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber;
+}
+
 /*
  * acquire_sample_rows -- acquire a random sample of rows from the heap
  *
@@ -1154,10 +1168,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
-#ifdef USE_PREFETCH
-	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
-	BlockSamplerData prefetch_bs;
-#endif
+	ReadStream *stream;
 
 	Assert(targrows > 0);
 
@@ -1170,13 +1181,6 @@ acquire_sample_rows(Relation onerel, int elevel,
 	randseed = pg_prng_uint32(&pg_global_prng_state);
 	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
 
-#ifdef USE_PREFETCH
-	prefetch_maximum = get_tablespace_maintenance_io_concurrency(onerel->rd_rel->reltablespace);
-	/* Create another BlockSampler, using the same seed, for prefetching */
-	if (prefetch_maximum)
-		(void) BlockSampler_Init(&prefetch_bs, totalblocks, targ

Re: Support prepared statement invalidation when result types change

2024-04-03 Thread Jelte Fennema-Nio
On Sun, 7 Jan 2024 at 07:55, vignesh C  wrote:
> One of the test has aborted in CFBot at [1] with:

Rebased the patchset and removed patch 0003 since it was causing the
CI issue reported by vignesh and it seems much less useful and more
controversial to me anyway (due to the extra required roundtrip).


On Sun, 7 Jan 2024 at 09:17, Shay Rojansky  wrote:
> Just to point out, FWIW, that the .NET Npgsql driver does indeed cache 
> RowDescriptions... The whole point of preparation is to optimize things as 
> much as possible for repeated execution of the query; I get that the value 
> there is much lower than e.g. doing another network roundtrip, but that's 
> still extra work that's better off being cut if it can be.

Hmm, interesting. I totally agree that it's always good to do less
when possible. The problem is that in the face of server side prepared
statement invalidations due to DDL changes to the table or search path
changes, the row types might change. Or the server needs to constantly
throw an error, like it does now, but that seems worse imho.

I'm wondering though if we can create a middleground, where a client
can still cache the RowDescription client side when no DDL or
search_patch changes are happening. But still tell the client about a
new RowDescription when they do happen.

The only way of doing that I can think of is changing the Postgres
protocol in a way similar to this: Allow Execute to return a
RowDescription too, but have the server only do so when the previously
received RowDescription for this prepared statement is now invalid.

This would definitely require some additional tracking at PgBouncer to
make it work, i.e. per client and per server it should now keep track
of the last RowDescription for each prepared statement. But that's
definitely something we could do.

This would make this patch much more involved though, as now it would
suddenly involve an actual protocol change, and that basically depends
on this patch moving forward[1].

[1]: 
https://www.postgresql.org/message-id/flat/cageczqtg2hcmb5gau53uuwcdc7gcnjfll6mnw0wnhwhgq9u...@mail.gmail.com


v6-0002-Completely-remove-fixed_result-from-CachedPlanSou.patch
Description: Binary data


v6-0001-Support-prepared-statement-invalidation-when-resu.patch
Description: Binary data


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Amit Kapila
On Wed, Apr 3, 2024 at 2:58 PM shveta malik  wrote:
>
> On Wed, Apr 3, 2024 at 11:17 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Apr 3, 2024 at 8:38 AM shveta malik  wrote:
> > >
> > > > > Or a simple solution is that the slotsync worker updates
> > > > > inactive_since as it does for non-synced slots, and disables
> > > > > timeout-based slot invalidation for synced slots.
> > >
> > > I like this idea better, it takes care of such a case too when the
> > > user is relying on sync-function rather than worker and does not want
> > > to get the slots invalidated in between 2 sync function calls.
> >
> > Please find the attached v31 patches implementing the above idea:
> >
>
> Thanks for the patches, please find few comments:
>
> v31-001:
>
> 1)
> system-views.sgml:
> value will get updated  after every synchronization from the
> corresponding remote slot on the primary.
>
> --This is confusing. It will be good to rephrase it.
>
> 2)
> update_synced_slots_inactive_since()
>
> --May be, we should mention in the header that this function is called
> only during promotion.
>
> 3) 040_standby_failover_slots_sync.pl:
> We capture inactive_since_on_primary when we do this for the first time at 
> #175
> ALTER SUBSCRIPTION regress_mysub1 DISABLE"
>
> But we again recreate the sub and disable it at line #280.
> Do you think we shall get inactive_since_on_primary again here, to be
> compared with inactive_since_on_new_primary later?
>

I think so.

Few additional comments on tests:
1.
+is( $standby1->safe_psql(
+ 'postgres',
+ "SELECT '$inactive_since_on_primary'::timestamptz <
'$inactive_since_on_standby'::timestamptz AND
+ '$inactive_since_on_standby'::timestamptz < '$slot_sync_time'::timestamptz;"

Shall we do <= check as we are doing in the main function
get_slot_inactive_since_value as the time duration is less so it can
be the same as well? Similarly, please check other tests.

2.
+=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)
+
+Get inactive_since column value for a given replication slot validating it
+against optional reference time.
+
+=cut
+
+sub get_slot_inactive_since_value

I see that all callers validate against reference time. It is better
to name it validate_slot_inactive_since rather than using get_* as the
main purpose is to validate the passed value.

-- 
With Regards,
Amit Kapila.




Re: LogwrtResult contended spinlock

2024-04-03 Thread Alvaro Herrera
Thanks for keeping this moving forward.  I gave your proposed patches a
look.   One thing I didn't like much is that we're adding a new member
(Copy) to XLogwrtAtomic -- but this struct is supposed to be a mirror of
XLogwrtResult for use with atomic access.  Since this new member is not
added to XLogwrtResult (because it's not needed there), the whole idea
of there being symmetry between those two structs crumbles down.
Because we later stop using struct-assign anyway, meaning we no longer
need the structs to match, we can instead spell out the members in
XLogCtl and call it a day.

So what I do in the attached 0001 is stop using the XLogwrtResult struct
in XLogCtl and replace it with separate Write and Flush values, and add
the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
and Flush from the shared XLogCtl to the local variable given as macro
argument.  (I also added our idiomatic do {} while(0) to the macro
definition, for safety).  The new members are XLogCtl->logWriteResult
and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
essentially identical semantics as the previous code.  No atomic access
yet!

0002 then adds pg_atomic_monotonic_advance_u64.  (I don't add the _u32
variant, because I don't think it's a great idea to add dead code.  If
later we see a need for it we can put it in.)  It also changes the two
new members to be atomics, changes the macro to use atomic read, and
XLogWrite now uses monotonic increment.  A couple of other places can
move the macro calls to occur outside the spinlock.  Also, XLogWrite
gains the invariant checking that involves Write and Flush.

Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and
Flush, and updates WALReadFromBuffers to test that instead of the Write
pointer, and adds in XLogWrite the invariant checks that involve the
Copy pointer.

I haven't rerun Bharath test loop yet; will do so shortly.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)
https://postgr.es/m/20050809113420.gd2...@phlogiston.dyndns.org
>From e6ddbe87d598c6a1090e39845a4a308060f0af1e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 3 Apr 2024 11:23:12 +0200
Subject: [PATCH v15 1/3] split XLogCtl->LogwrtResult into separate struct
 members

---
 src/backend/access/transam/xlog.c | 59 ++-
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87ea03954b..6213d99561 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -291,16 +291,15 @@ static bool doPageWrites;
  *
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
- * LogwrtResult indicates the byte positions we have already written/fsynced.
- * These structs are identical but are declared separately to indicate their
- * slightly different functions.
+ * The positions already written/fsynced are maintained in logWriteResult
+ * and logFlushResult.
  *
- * To read XLogCtl->LogwrtResult, you must hold either info_lck or
- * WALWriteLock.  To update it, you need to hold both locks.  The point of
- * this arrangement is that the value can be examined by code that already
- * holds WALWriteLock without needing to grab info_lck as well.  In addition
- * to the shared variable, each backend has a private copy of LogwrtResult,
- * which is updated when convenient.
+ * To read XLogCtl->logWriteResult or ->logFlushResult, you must hold either
+ * info_lck or WALWriteLock.  To update them, you need to hold both locks.
+ * The point of this arrangement is that the value can be examined by code
+ * that already holds WALWriteLock without needing to grab info_lck as well.
+ * In addition to the shared variable, each backend has a private copy of
+ * both in LogwrtResult, which is updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
  * (protected by info_lck), but we don't need to cache any copies of it.
@@ -478,7 +477,8 @@ typedef struct XLogCtlData
 	 * Protected by info_lck and WALWriteLock (you must hold either lock to
 	 * read it, but both to update)
 	 */
-	XLogwrtResult LogwrtResult;
+	XLogRecPtr	logWriteResult; /* last byte + 1 written out */
+	XLogRecPtr	logFlushResult; /* last byte + 1 flushed */
 
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
@@ -614,6 +614,15 @@ static int	UsableBytesInSegment;
  */
 static XLogwrtResult LogwrtResult = {0, 0};
 
+/*
+ * Update local copy of shared XLogCtl->log{Write,Flush}Result
+ */
+#define XLogUpdateLocalLogwrtResult(_target) \
+	do { \
+		_target.Write = XLogCtl->logWrit

Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)

2024-04-03 Thread Ranier Vilela
Em ter., 2 de abr. de 2024 às 18:13, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > While I working in [1], Coverity reported some errors:
> > src/bin/pg_basebackup/pg_createsubscriber.c
> > CID 1542690: (#1 of 2): Out-of-bounds access (OVERRUN)
> > alloc_strlen: Allocating insufficient memory for the terminating null of
> > the string. [Note: The source code implementation of the function has
> been
> > overridden by a builtin model.]
> > CID 1542690: (#2 of 2): Out-of-bounds access (OVERRUN)
> > alloc_strlen: Allocating insufficient memory for the terminating null of
> > the string. [Note: The source code implementation of the function has
> been
> > overridden by a builtin model.]
>
> Yeah, we saw that in the community run too.  I'm tempted to call it
> an AI hallucination.  The "Note" seems to mean that they're not
> actually analyzing our code but some made-up substitute.
>
Yeah, the message is a little confusing.
It seems to me that it is a replacement of the malloc function with its
own, with some type of security cookie,
like /GS (Buffer Security Check)



> > The source of errors is the function PQescapeInternal.
> > The slow path has bugs when num_quotes or num_backslashes are greater
> than
> > zero.
> > For each num_quotes or num_backslahes we need to allocate two more.
>
> Nonsense.  The quote or backslash is already counted in input_len,
> so we need to add just one more.
>
Right, the quote or backslash is counted in input_len.
In the test I did, the string had 10 quotes, so
input_len had at least 10 bytes for quotes.
But we write 20 quotes, in the slow path.

if (*s == quote_char || (!as_ident && *s == '\\'))
*rp++ = *s;
*rp++ = *s;

|0| = quote_char
|1| = quote_char
|2| = quote_char
|3| = quote_char
|4| = quote_char
|5| = quote_char
|6| = quote_char
|7| = quote_char
|8| = quote_char
|9| = quote_char
|10| = quote_char <--memory overwrite
|11| = quote_char
|12| = quote_char
|13| = quote_char
|14| = quote_char
|15| = quote_char
|16| = quote_char
|17| = quote_char
|18| = quote_char
|19| = quote_char



> If there were anything wrong here, I'm quite sure our testing under
> e.g. valgrind would have caught it years ago.  However, just to be
> sure, I tried adding an Assert that the allocated space is filled
> exactly, as attached.  It gets through check-world just fine.
>
It seems to me that only the fast path is being validated by the assert.

if (num_quotes == 0 && (num_backslashes == 0 || as_ident))
{
memcpy(rp, str, input_len);
rp += input_len;
}

best regards,
Ranier Vilela


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 12:20 PM Bertrand Drouvot
 wrote:
>
> > Please find the attached v31 patches implementing the above idea:
>
> Some comments related to v31-0001:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots get their on inactive_since just like any other slot
>
> It behaves as described.
>
> T2 ===
>
> > - synced slots inactive_since is set to current timestamp after the
> > standby gets promoted to help inactive_since interpret correctly just
> > like any other slot.
>
> It behaves as described.

Thanks for testing.

> CR1 ===
>
> +inactive_since value will get updated
> +after every synchronization
>
> indicates the last synchronization time? (I think that after every 
> synchronization
> could lead to confusion).

Done.

> CR2 ===
>
> +   /*
> +* Set the time since the slot has become inactive 
> after shutting
> +* down slot sync machinery. This helps correctly 
> interpret the
> +* time if the standby gets promoted without a 
> restart.
> +*/
>
> It looks to me that this comment is not at the right place because there is
> nothing after the comment that indicates that we shutdown the "slot sync 
> machinery".
>
> Maybe a better place is before the function definition and mention that this 
> is
> currently called when we shutdown the "slot sync machinery"?

Done.

> CR3 ===
>
> +* We get the current time beforehand and only once 
> to avoid
> +* system calls overhead while holding the lock.
>
> s/avoid system calls overhead while holding the lock/avoid system calls while 
> holding the spinlock/?

Done.

> CR4 ===
>
> +* Set the time since the slot has become inactive. We get the current
> +* time beforehand to avoid system call overhead while holding the 
> lock
>
> Same.

Done.

> CR5 ===
>
> +   # Check that the captured time is sane
> +   if (defined $reference_time)
> +   {
>
> s/Check that the captured time is sane/Check that the inactive_since is sane/?
>
> Sorry if some of those comments could have been done while I did review 
> v29-0001.

Done.

On Wed, Apr 3, 2024 at 2:58 PM shveta malik  wrote:
>
> Thanks for the patches, please find few comments:
>
> v31-001:
>
> 1)
> system-views.sgml:
> value will get updated  after every synchronization from the
> corresponding remote slot on the primary.
>
> --This is confusing. It will be good to rephrase it.

Done as per Bertrand's suggestion.

> 2)
> update_synced_slots_inactive_since()
>
> --May be, we should mention in the header that this function is called
> only during promotion.

Done as per Bertrand's suggestion.

> 3) 040_standby_failover_slots_sync.pl:
> We capture inactive_since_on_primary when we do this for the first time at 
> #175
> ALTER SUBSCRIPTION regress_mysub1 DISABLE"
>
> But we again recreate the sub and disable it at line #280.
> Do you think we shall get inactive_since_on_primary again here, to be
> compared with inactive_since_on_new_primary later?

Hm. Done that. Recapturing both slot_creation_time_on_primary and
inactive_since_on_primary before and after CREATE SUBSCRIPTION creates
the slot again on the primary/publisher.

On Wed, Apr 3, 2024 at 3:32 PM Amit Kapila  wrote:
>
> > CR2 ===
> >
> > +   /*
> > +* Set the time since the slot has become inactive 
> > after shutting
> > +* down slot sync machinery. This helps correctly 
> > interpret the
> > +* time if the standby gets promoted without a 
> > restart.
> > +*/
> >
> > It looks to me that this comment is not at the right place because there is
> > nothing after the comment that indicates that we shutdown the "slot sync 
> > machinery".
> >
> > Maybe a better place is before the function definition and mention that 
> > this is
> > currently called when we shutdown the "slot sync machinery"?
> >
> Won't it be better to have an assert for SlotSyncCtx->pid? IIRC, we
> have some existing issues where we don't ensure that no one is running
> sync API before shutdown is complete but I think we can deal with that
> separately and here we can still have an Assert.

That can work to ensure the slot sync worker isn't running as
SlotSyncCtx->pid gets updated only for the slot sync worker. I added
this assertion for now.

We need to ensure (in a separate patch and thread) there is no backend
acquiring it and performing sync while the slot sync worker is
shutting down. Otherwise, some of the slots can get resynced and some
are not while we are shutting down the slot sync worker as part of the
standby promotion which might leave the slots in an inconsistent
state.

> > CR3 ===
> >
> > +* We get the current time beforehand and only once 
> > to avoid
> > +* system calls overhead while holding 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-04-03 Thread Alexander Lakhin

Hello Alvaro,

27.02.2024 20:33, Alvaro Herrera wrote:

Here's the complete set, with these two names using the singular.



I've managed to trigger an assert added by 53c2a97a9.
Please try the following script against a server compiled with
-DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
define, it just simplifies reproducing...):
# initdb & start ...

createdb test
echo "
SELECT pg_current_xact_id() AS tx
\gset

SELECT format('CREATE TABLE t%s(i int)', g)
  FROM generate_series(1, 1022 - :tx) g
\gexec

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT pg_current_xact_id();
SELECT pg_sleep(5);
" | psql test &

echo "
SELECT pg_sleep(1);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 1 INTO a;
COMMIT;

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 2 INTO b;
" | psql test

It fails for me with the following stack trace:
TRAP: failed Assert("LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE)"), File: "slru.c", Line: 366, 
PID: 21711

ExceptionalCondition at assert.c:52:13
SimpleLruZeroPage at slru.c:369:11
SerialAdd at predicate.c:921:20
SummarizeOldestCommittedSxact at predicate.c:1521:2
GetSerializableTransactionSnapshotInt at predicate.c:1787:16
GetSerializableTransactionSnapshot at predicate.c:1691:1
GetTransactionSnapshot at snapmgr.c:264:21
exec_simple_query at postgres.c:1162:4
...

Best regards,
Alexander




Re: LogwrtResult contended spinlock

2024-04-03 Thread Alvaro Herrera
On 2024-Apr-03, Alvaro Herrera wrote:

> So what I do in the attached 0001 is stop using the XLogwrtResult struct
> in XLogCtl and replace it with separate Write and Flush values, and add
> the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> and Flush from the shared XLogCtl to the local variable given as macro
> argument.  (I also added our idiomatic do {} while(0) to the macro
> definition, for safety).  The new members are XLogCtl->logWriteResult
> and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> essentially identical semantics as the previous code.  No atomic access
> yet!

BTW I forgot.  I didn't like the name XLogUpdateLocalLogwrtResult() name
much.  What do you think about RefreshXLogWriteResult() instead?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)




Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

2024-04-03 Thread Nazir Bilal Yavuz
Hi,

On Wed, 3 Apr 2024 at 12:11, Daniel Gustafsson  wrote:
>
> > On 6 Mar 2024, at 09:59, Daniel Gustafsson  wrote:
>
> > Nothing sticks out from reading through these patches, they seem quite 
> > ready to
> > me.
>
> Took another look at this today and committed it. Thanks!

Thanks for the commit!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Streaming I/O, vectored I/O (WIP)

2024-04-03 Thread Melanie Plageman
On Tue, Apr 2, 2024 at 8:32 PM Thomas Munro  wrote:
>
> Here are the remaining patches discussed in this thread.  They give
> tablespace-specific io_combine_limit, effective_io_readahead_window
> (is this useful?), and up-to-1MB io_combine_limit (is this useful?).
> I think the first two would probably require teaching reloption.c how
> to use guc.c's parse_int() and unit flags, but I won't have time to
> look at that for this release so I'll just leave these here.
>
> On the subject of guc.c, this is a terrible error message... did I do
> something wrong?
>
> postgres=# set io_combine_limit = '42MB';
> ERROR:  5376 8kB is outside the valid range for parameter
> "io_combine_limit" (1 .. 32)

Well, GUC_UNIT_BLOCKS interpolates the block limit into the error
message string (get_config_unit_name()). But, I can't imagine this
error message is clear for any of the GUCs using GUC_UNIT_BLOCKS. I
would think some combination of the two would be helpful, like "43008
kB (5376 blocks) is outside of the valid range for parameter". The
user can check what their block size is. I don't think we need to
interpolate and print the block size in the error message.

On another note, since io_combine_limit, when specified in size,
rounds up to the nearest multiple of blocksize, it might be worth
mentioning this in the io_combine_limit docs at some point. I checked
docs for another GUC_UNIT_BLOCKS guc, backend_flush_after, and it
alludes to this.

- Melanie




Re: New Table Access Methods for Multi and Single Inserts

2024-04-03 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 2:32 PM Bharath Rupireddy
 wrote:
>
> I too prefer the latter so that the caller doesn't have to have two
> paths. The new API can just transparently fallback to single inserts.
> I've implemented that in the attached v17 patch. I also tested the
> default APIs manually, but I'll see if I can add some tests to it the
> default API.

Fixed a compiler warning found via CF bot. Please find the attached
v18 patches. I'm sorry for the noise.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From ff1278b77e0d6ac6a49f0826602bd948e78c7a91 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 3 Apr 2024 11:56:14 +
Subject: [PATCH v18 1/2] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c | 189 ++-
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableam.c   |  86 +++
 src/backend/access/table/tableamapi.c|   8 +
 src/include/access/heapam.h  |  41 +
 src/include/access/tableam.h | 108 +
 src/tools/pgindent/typedefs.list |   3 +
 7 files changed, 439 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..69f8c597d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,8 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 		bool *copy);
-
+static void heap_modify_buffer_flush(TableModifyState *state);
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2441,6 +2443,191 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	*insert_indexes = true;
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags, CommandId cid,
+  int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+	"heap_modify memory context",
+	ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+		  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_cb = heap_modify_insert_end;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+	 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->c

Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Heikki Linnakangas

On 02/04/2024 16:11, Heikki Linnakangas wrote:

On 01/04/2024 20:22, Melanie Plageman wrote:

Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.


Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.

I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.


This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.


I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.


One more version, with two small fixes:

1. I fumbled the offsetnumber-cmp function at the last minute so that it 
didn't compile. Fixed. that


2. On VACUUM on an unlogged or temp table, the logic always thought that 
we would be generating an FPI, causing it to always freeze when it 
could. But of course, you never generate FPIs on an unlogged table. 
Fixed that. (Perhaps we should indeed freeze more aggressively on an 
unlogged table, but changing the heuristic is out of scope for this patch.)


Off-list, Melanie reported that there is a small regression with the 
benchmark script she posted yesterday, after all, but I'm not able to 
reproduce that.


--
Heikki Linnakangas
Neon (https://neon.tech)
From e04bda4666d7eaff0c520a9c9e1468a9c4cc9f51 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v13 1/3] Refactor how heap_prune_chain() updates prunable_xid

In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.

Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.

Author: Melanie Plageman 
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
 src/backend/access/heap/pruneheap.c | 125 
 1 file changed, 92 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa5..1b5bf990d21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
 
 static void page_verify_redirects(Page page);
 
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged(&prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
 			continue;
 		}
 
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			if (unlikely(prstate.mark_unused_now))
 heap_prune_record_unused(&prstate, offnum, false);
 			else
-heap_prune_record_unchanged(&prstate, offnum);
+heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
 			continue;
 		}
 
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			}
 		}
 		else
-			heap_prune_record_unchanged(&prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 		 */
 		chainitems[nchain++] = offnum;
 
-		/*
-		 * Check tuple's visibility statu

Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 3:57 AM, walt...@technowledgy.de wrote:

> I can also imagine that it would be very helpful in a container setup to be 
> able to set an environment variable with this path instead of having to 
> recompile all of postgres to change it.

Yes, I like the suggestion to make it require a restart, which lets the 
sysadmin control it and not limited to whatever the person who compiled it 
thought would make sense.

Best,

David





Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Or SIGHUP?

D



Re: Separate memory contexts for relcache and catcache

2024-04-03 Thread Melih Mutlu
vignesh C , 27 Oca 2024 Cmt, 06:01 tarihinde şunu
yazdı:

> On Wed, 3 Jan 2024 at 16:56, Melih Mutlu  wrote:
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> 729439607ad210dbb446e31754e8627d7e3f7dda ===
> === applying patch
> ./v2-0001-Separate-memory-contexts-for-relcache-and-catcach.patch
> patching file src/backend/utils/cache/catcache.c
> ...
> Hunk #8 FAILED at 1933.
> Hunk #9 succeeded at 2253 (offset 84 lines).
> 1 out of 9 hunks FAILED -- saving rejects to file
> src/backend/utils/cache/catcache.c.rej
>
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_4554.log
>
> Regards,
> Vignesh
>

Rebased. PSA.


-- 
Melih Mutlu
Microsoft


v3-0001-Separate-memory-contexts-for-relcache-and-catcach.patch
Description: Binary data


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Bertrand Drouvot
Hi,

On Wed, Apr 03, 2024 at 05:12:12PM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 4:19 PM Amit Kapila  wrote:
> >
> > + 'postgres',
> > + "SELECT '$inactive_since_on_primary'::timestamptz <
> > '$inactive_since_on_standby'::timestamptz AND
> > + '$inactive_since_on_standby'::timestamptz < 
> > '$slot_sync_time'::timestamptz;"
> >
> > Shall we do <= check as we are doing in the main function
> > get_slot_inactive_since_value as the time duration is less so it can
> > be the same as well? Similarly, please check other tests.
> 
> I get you. If the tests are so fast that losing a bit of precision
> might cause tests to fail. So, I'v added equality check for all the
> tests.

> Please find the attached v32-0001 patch with the above review comments
> addressed.

Thanks!

Just one comment on v32-0001:

+# Synced slot on the standby must get its own inactive_since.
+is( $standby1->safe_psql(
+   'postgres',
+   "SELECT '$inactive_since_on_primary'::timestamptz <= 
'$inactive_since_on_standby'::timestamptz AND
+   '$inactive_since_on_standby'::timestamptz <= 
'$slot_sync_time'::timestamptz;"
+   ),
+   "t",
+   'synchronized slot has got its own inactive_since');
+

By using <= we are not testing that it must get its own inactive_since (as we
allow them to be equal in the test). I think we should just add some usleep()
where appropriate and deny equality during the tests on inactive_since.

Except for the above, v32-0001 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




[PATCH] Modify pg_ctl to detect presence of geek user

2024-04-03 Thread Panda Developpeur
Dear PostgreSQL Hackers,

I am submitting a patch to modify pg_ctl to detect the presence of a geek
user on the system and adapt its behavior accordingly. This patch
introduces the following changes:

   1.

   *Detection of geek user*: The modified pg_ctl now checks user created on
   the computer.
   2.

   *No documentation or tests*: Please note that I have not included new
   documentation or tests in this patch submission. However, I am open to
   adding them based on the community's feedback.
   3.

   *Performance impact*: The performance impact of these changes is
   minimal, with an expected delay of 500ms in specific scenarios only.


Please review the patch and provide your feedback. I am open to making any
necessary improvements based on the community's suggestions.

Thank you for considering my contribution.

Best regards,


0001-Geek-detection.patch
Description: Binary data


Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-04-03 Thread Melih Mutlu
Hi,

Michael Paquier , 14 Şub 2024 Çar, 10:23 tarihinde
şunu yazdı:

> On Fri, Jan 19, 2024 at 05:41:45PM +0900, torikoshia wrote:
> > We already have additional description below the table which explains
> each
> > column of the system view. For example pg_locks:
> > https://www.postgresql.org/docs/devel/view-pg-locks.html
>
> I was reading the patch, and using int[] as a representation of the
> path of context IDs up to the top-most parent looks a bit strange to
> me, with the relationship between each parent -> child being
> preserved, visibly, based on the order of the elements in this array
> made of temporary IDs compiled on-the-fly during the function
> execution.  Am I the only one finding that a bit strange?  Could it be
> better to use a different data type for this path and perhaps switch
> to the names of the contexts involved?
>

Do you find having the path column strange all together? Or only using
temporary IDs to generate that column? The reason why I avoid using context
names is because there can be multiple contexts with the same name. This
makes it difficult to figure out which context, among those with that
particular name, is actually included in the path. I couldn't find any
other information that is unique to each context.

Thanks,
-- 
Melih Mutlu
Microsoft


Re: [PATCH] Modify pg_ctl to detect presence of geek user

2024-04-03 Thread Bruce Momjian
On Wed, Apr  3, 2024 at 04:17:21PM +0300, Panda Developpeur wrote:
> Dear PostgreSQL Hackers,
> 
> I am submitting a patch to modify pg_ctl to detect the presence of a geek user
> on the system and adapt its behavior accordingly. This patch introduces the
> following changes:
> 
>  1. Detection of geek user: The modified pg_ctl now checks user created on the
> computer.
> 
>  2. No documentation or tests: Please note that I have not included new
> documentation or tests in this patch submission. However, I am open to
> adding them based on the community's feedback.
> 
>  3. Performance impact: The performance impact of these changes is minimal,
> with an expected delay of 500ms in specific scenarios only.
> 
> 
> Please review the patch and provide your feedback. I am open to making any
> necessary improvements based on the community's suggestions.
> 
> Thank you for considering my contribution.

Aside from an extra newline in the patch, I think this is ready to go!

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: [PATCH] Modify pg_ctl to detect presence of geek user

2024-04-03 Thread Bruce Momjian
On Wed, Apr  3, 2024 at 09:25:02AM -0400, Bruce Momjian wrote:
> On Wed, Apr  3, 2024 at 04:17:21PM +0300, Panda Developpeur wrote:
> > Dear PostgreSQL Hackers,
> > 
> > I am submitting a patch to modify pg_ctl to detect the presence of a geek 
> > user
> > on the system and adapt its behavior accordingly. This patch introduces the
> > following changes:
> > 
> >  1. Detection of geek user: The modified pg_ctl now checks user created on 
> > the
> > computer.
> > 
> >  2. No documentation or tests: Please note that I have not included new
> > documentation or tests in this patch submission. However, I am open to
> > adding them based on the community's feedback.
> > 
> >  3. Performance impact: The performance impact of these changes is minimal,
> > with an expected delay of 500ms in specific scenarios only.
> > 
> > 
> > Please review the patch and provide your feedback. I am open to making any
> > necessary improvements based on the community's suggestions.
> > 
> > Thank you for considering my contribution.
> 
> Aside from an extra newline in the patch, I think this is ready to go!

Also, it feels like the deadline for this patch was two days ago.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Jelte Fennema-Nio
On Tue, 2 Apr 2024 at 16:33, Robert Haas  wrote:
> Committed it, I did. My thanks for working on this issue, I extend.

Looking at the committed version of this patch, the pg_unreachable
calls seemed weird to me. 1 is actually incorrect, thus possibly
resulting in undefined behaviour. And for the other call an imho
better fix would be to remove the now 21 year unused enum variant,
instead of introducing its only reference in the whole codebase.

Attached are two trivial patches, feel free to remove both of the
pg_unreachable calls.
From 7f4279bb939e2d2707fdd0f471893d734959ab91 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:21:52 +0200
Subject: [PATCH v11 2/2] Remove PGRES_POLLING_ACTIVE

This enum variant has been unused for at least 21 years
44aba280207740d0956160c0288e61f28f024a71. It was left for backwards
compatibility for "awhile". I think that time has come.
---
 src/bin/psql/command.c  | 2 --
 src/interfaces/libpq/libpq-fe.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index dc3cc7c10b7..9a163cf22cd 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3811,8 +3811,6 @@ wait_until_connected(PGconn *conn)
 			case PGRES_POLLING_WRITING:
 forRead = false;
 continue;
-			case PGRES_POLLING_ACTIVE:
-pg_unreachable();
 		}
 	}
 }
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 8d3c5c6f662..2459b0cf5e1 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -91,8 +91,6 @@ typedef enum
 	PGRES_POLLING_READING,		/* These two indicate that one may	  */
 	PGRES_POLLING_WRITING,		/* use select before polling again.   */
 	PGRES_POLLING_OK,
-	PGRES_POLLING_ACTIVE		/* unused; keep for awhile for backwards
- * compatibility */
 } PostgresPollingStatusType;
 
 typedef enum
-- 
2.34.1

From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:19:04 +0200
Subject: [PATCH v11 1/2] Fix actually reachable pg_unreachable call

In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was
introduced that was actually reachable, because there were break
statements in the loop. This addresses that by both changing the break
statements to return and removing the call to pg_unreachable, which
seemed of dubious necessity anyway.
---
 src/bin/psql/command.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c005624e9c3..dc3cc7c10b7 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn)
 		 * user has requested a cancellation.
 		 */
 		if (cancel_pressed)
-			break;
+			return;
 
 		/*
 		 * Do not assume that the socket remains the same across
@@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn)
 		 */
 		sock = PQsocket(conn);
 		if (sock == -1)
-			break;
+			return;
 
 		/*
 		 * If the user sends SIGINT between the cancel_pressed check, and
@@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn)
 pg_unreachable();
 		}
 	}
-
-	pg_unreachable();
 }
 
 void

base-commit: 936e3fa3787a51397280c1081587586e83c20399
-- 
2.34.1



Re: pg_combinebackup --copy-file-range

2024-04-03 Thread Jakub Wartak
On Mon, Apr 1, 2024 at 9:46 PM Tomas Vondra
 wrote:
>
> Hi,
>
> I've been running some benchmarks and experimenting with various stuff,
> trying to improve the poor performance on ZFS, and the regression on XFS
> when using copy_file_range. And oh boy, did I find interesting stuff ...

[..]

Congratulations on great results!

> 4) after each pg_combinebackup run to pg_verifybackup, start the cluster
> to finish recovery, run pg_checksums --check (to check the patches don't
> produce something broken)

I've performed some follow-up small testing on all patches mentioned
here  (1..7), with the earlier developed nano-incremental-backup-tests
that helped detect some issues for Robert earlier during original
development. They all went fine in both cases:
- no special options when using pg_combinebackup
- using pg_combinebackup --copy-file-range --manifest-checksums=NONE

Those were:
test_across_wallevelminimal.sh
test_full_pri__incr_stby__restore_on_pri.sh
test_full_pri__incr_stby__restore_on_stby.sh
test_full_stby__incr_stby__restore_on_pri.sh
test_full_stby__incr_stby__restore_on_stby.sh
test_incr_after_timelineincrease.sh
test_incr_on_standby_after_promote.sh
test_many_incrementals_dbcreate_duplicateOID.sh
test_many_incrementals_dbcreate_filecopy_NOINCR.sh
test_many_incrementals_dbcreate_filecopy.sh
test_many_incrementals_dbcreate.sh
test_many_incrementals.sh
test_multixact.sh
test_pending_2pc.sh
test_reindex_and_vacuum_full.sh
test_repro_assert_RP.sh
test_repro_assert.sh
test_standby_incr_just_backup.sh
test_stuck_walsum.sh
test_truncaterollback.sh
test_unlogged_table.sh

> Now to the findings 
>
>
> 1) block alignment

[..]

> And I think we probably want to do this now, because this affects all
> tools dealing with incremental backups - even if someone writes a custom
> version of pg_combinebackup, it will have to deal with misaligned data.
> Perhaps there might be something like pg_basebackup that "transforms"
> the data received from the server (and also the backup manifest), but
> that does not seem like a great direction.

If anything is on the table, then I think in the far future
pg_refresh_standby_using_incremental_backup_from_primary would be the
only other tool using the format ?

> 2) prefetch
> ---
[..]
> I think this means we may need a "--prefetch" option, that'd force
> prefetching, probably both before pread and copy_file_range. Otherwise
> people on ZFS are doomed and will have poor performance.

Right, we could optionally cover in the docs later-on various options
to get the performance (on XFS use $this, but without $that and so
on). It's kind of madness dealing with all those performance
variations.

Another idea: remove that 128 posifx_fadvise() hardcode in 0002 and a
getopt variant like: --prefetch[=HOWMANY] with 128 being as default ?

-J.




Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Here’s a revision of the Debian patch that requires a server start.

However, in studying the patch, it appears that the `extension_directory` is 
searched for *all* shared libraries, not just those being loaded for an 
extension. Am I reading the `expand_dynamic_library_name()` function right?

If so, this seems like a good way for a bad actor to muck with things, by 
putting an exploited libpgtypes library into the extension directory, where it 
would be loaded in preference to the core libpgtypes library, if they couldn’t 
exploit the original.

I’m thinking it would be better to have the dynamic library lookup for 
extension libraries (and LOAD libraries?) separate, so that the 
`extension_directory` would not be used for core libraries.

This would also allow the lookup of extension libraries prefixed by the 
directory field from the control file, which would enable much tidier extension 
installation: The control file, SQL scripts, and DSOs could all be in a single 
directory for an extension.

Thoughts?

Best,

David




v1-0001-Add-extension_directory-GUC.patch
Description: Binary data


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Jacob Champion
On Tue, Apr 2, 2024 at 11:55 AM Daniel Gustafsson  wrote:
> The attached removes 1.0.2 support (meson build parts untested yet) with a few
> small touch ups of related documentation.  I haven't yet done the research on
> where that leaves LibreSSL since we don't really define anywhere what we
> support (so for we've gotten by assuming it's kind of sort 1.0.2 for the parts
> we care about which is skating on fairly thin ice).

As far as I can tell, no versions of LibreSSL so far provide
X509_get_signature_info(), so this patch is probably a bit too
aggressive.

--Jacob




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-04-03 Thread Alvaro Herrera
Hello,

On 2024-Apr-03, Alexander Lakhin wrote:

> I've managed to trigger an assert added by 53c2a97a9.
> Please try the following script against a server compiled with
> -DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
> define, it just simplifies reproducing...):

Ah yes, absolutely, we're missing to trade the correct SLRU bank lock
there.  This rewrite of that small piece should fix it.  Thanks for
reporting this.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
>From 44c39cf4bf258fb0b65aa1acc5f84e5d7f729eb1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 3 Apr 2024 16:00:24 +0200
Subject: [PATCH] Fix zeroing of pg_serial page without SLRU bank lock

---
 src/backend/storage/lmgr/predicate.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 3f378c0099..d5bbfbd4c6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -137,7 +137,7 @@
  *	SerialControlLock
  *		- Protects SerialControlData members
  *
- *	SerialSLRULock
+ *	SLRU per-bank locks
  *		- Protects SerialSlruCtl
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -908,20 +908,25 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
 	if (isNewPage)
 	{
-		/* Initialize intervening pages. */
-		while (firstZeroPage != targetPage)
+		/* Initialize intervening pages; might involve trading locks */
+		for (;;)
 		{
-			(void) SimpleLruZeroPage(SerialSlruCtl, firstZeroPage);
+			lock = SimpleLruGetBankLock(SerialSlruCtl, firstZeroPage);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			slotno = SimpleLruZeroPage(SerialSlruCtl, firstZeroPage);
+			if (firstZeroPage == targetPage)
+break;
 			firstZeroPage = SerialNextPage(firstZeroPage);
+			LWLockRelease(lock);
 		}
-		slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
 	}
 	else
+	{
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+	}
 
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
-- 
2.39.2



Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Melanie Plageman
On Wed, Apr 3, 2024 at 8:39 AM Heikki Linnakangas  wrote:
>
> On 02/04/2024 16:11, Heikki Linnakangas wrote:
> > On 01/04/2024 20:22, Melanie Plageman wrote:
> >> Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
> >> you didn't modify them much/at all, but I noticed some things in my code
> >> that could be better.
> >
> > Ok, here's what I have now. I made a lot of small comment changes here
> > and there, and some minor local refactorings, but nothing major. I lost
> > track of all the individual changes I'm afraid, so I'm afraid you'll
> > have to just diff against the previous version if you want to see what's
> > changed. I hope I didn't break anything.
> >
> > I'm pretty happy with this now. I will skim through it one more time
> > later today or tomorrow, and commit. Please review once more if you have
> > a chance.
> >
> >> This probably doesn't belong here. I noticed spgdoinsert.c had a static
> >> function for sorting OffsetNumbers, but I didn't see anything general
> >> purpose anywhere else.
> >
> > I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
> > be nice to have just one copy of this in some common place, but I also
> > wasn't sure where to put it.
>
> One more version, with two small fixes:
>
> 1. I fumbled the offsetnumber-cmp function at the last minute so that it
> didn't compile. Fixed. that

I noticed you didn't make the comment updates I suggested in my
version 13 here [1]. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).

I applied them to your v13 and attached the diff.

> Off-list, Melanie reported that there is a small regression with the
> benchmark script she posted yesterday, after all, but I'm not able to
> reproduce that.

Actually, I think it was noise.

- Melanie

[1] 
https://www.postgresql.org/message-id/CAAKRu_aPqZkThyfr0USaHp-3cN_ruEdAHBKtNQJqXDTjWUz0rw%40mail.gmail.com
From 882e937c122f5e83bc9ba643443c1a27c807d82e Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 3 Apr 2024 10:12:44 -0400
Subject: [PATCH 3/3] comment updates

---
 src/backend/access/heap/pruneheap.c | 53 +
 src/backend/storage/ipc/procarray.c |  6 ++--
 src/include/access/heapam.h |  2 +-
 3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8ed44ba93d..4a6a4cee4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -328,7 +328,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required by
+ * all callers.
  *
  * reason indicates why the pruning is performed.  It is included in the WAL
  * record for debugging and analysis purposes, but otherwise has no effect.
@@ -393,6 +394,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.pagefrz.freeze_required = false;
 	if (prstate.freeze)
 	{
+		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
 		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
 		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
@@ -415,19 +417,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We keep track of whether
-	 * the page will be all_visible and all_frozen, once we're done with the
-	 * pruning and freezing, to help the caller to do that.
+	 * Caller may update the VM after we're done.  We can keep track of
+	 * whether the page will be all-visible and all-frozen after pruning and
+	 * freezing to help the caller to do that.
 	 *
 	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * only the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
-	 * to update the VM bits without also freezing, or freezing without
+	 * the bookkeeping if the caller needs it.  Currently, that's tied to
+	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
+	 * to update the VM bits without also freezing or freeze without also
 	 * setting the VM bits.
 	 *
 	 * In addition to telling the caller whether it can set the VM bit, we
 	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page will become frozen, we consider opportunistically
+	 * the whole page would become frozen, we consider opportunistically
 	 * freezing tuples.  We will not be able to freeze the whole page if there
 	 * are tuples present that are n

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-04-03 Thread Dilip Kumar
On Wed, Apr 3, 2024 at 7:40 PM Alvaro Herrera  wrote:
>
> Hello,
>
> On 2024-Apr-03, Alexander Lakhin wrote:
>
> > I've managed to trigger an assert added by 53c2a97a9.
> > Please try the following script against a server compiled with
> > -DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
> > define, it just simplifies reproducing...):
>
> Ah yes, absolutely, we're missing to trade the correct SLRU bank lock
> there.  This rewrite of that small piece should fix it.  Thanks for
> reporting this.
>

Yeah, we missed acquiring the bank lock w.r.t. intervening pages,
thanks for reporting.  Your fix looks correct to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Tristan Partin

On Wed Apr 3, 2024 at 8:32 AM CDT, Jelte Fennema-Nio wrote:

On Tue, 2 Apr 2024 at 16:33, Robert Haas  wrote:
> Committed it, I did. My thanks for working on this issue, I extend.

Looking at the committed version of this patch, the pg_unreachable
calls seemed weird to me. 1 is actually incorrect, thus possibly
resulting in undefined behaviour. And for the other call an imho
better fix would be to remove the now 21 year unused enum variant,
instead of introducing its only reference in the whole codebase.

Attached are two trivial patches, feel free to remove both of the
pg_unreachable calls.


Patches look good. Sorry about causing you to do some work.

--
Tristan Partin
Neon (https://neon.tech)




Re: Is it safe to cache data by GiST consistent function

2024-04-03 Thread Tom Lane
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?=  writes:
> When implementing a GiST consistent function I found the need to cache 
> pre-processed query across invocations.
> I am not sure if it is safe to do (or I need to perform some steps to make 
> sure cached info is not leaked between rescans).

AFAIK it works.  I don't see any of the in-core ones doing so,
but at least range_gist_consistent and multirange_gist_consistent
are missing a bet by repeating their cache search every time.

> The comment in gistrescan says:

>   /*
>* If this isn't the first time through, preserve the fn_extra
>* pointers, so that if the consistentFns are using them to 
> cache
>* data, that data is not leaked across a rescan.
>*/

> which seems to me self-contradictory as fn_extra is preserved between rescans 
> (so leaks are indeed possible).

I think you're reading it wrong.  If we cleared fn_extra during
rescan, access to the old extra value would be lost so a new one
would have to be created, leaking the old value for the rest of
the query.

regards, tom lane




Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Tom Lane
Jelte Fennema-Nio  writes:
> Looking at the committed version of this patch, the pg_unreachable
> calls seemed weird to me. 1 is actually incorrect, thus possibly
> resulting in undefined behaviour. And for the other call an imho
> better fix would be to remove the now 21 year unused enum variant,
> instead of introducing its only reference in the whole codebase.

If we do the latter, we will almost certainly get pushback from
distros who check for library ABI breaks.  I fear the comment
suggesting that we could remove it someday is too optimistic.

regards, tom lane




Re: remaining sql/json patches

2024-04-03 Thread jian he
hi.
+  
+   json_table is an SQL/JSON function which
+   queries JSON data
+   and presents the results as a relational view, which can be accessed as a
+   regular SQL table. You can only use
json_table inside the
+   FROM clause of a SELECT,
+   UPDATE, DELETE, or
MERGE
+   statement.
+  

the only issue is that MERGE Synopsis don't have
FROM clause.
other than that, it's quite correct.
see following tests demo:

drop table ss;
create table ss(a int);
insert into ss select 1;
delete from ss using JSON_TABLE(jsonb '1', '$' COLUMNS (a int PATH '$'
) ERROR ON ERROR) jt where jt.a = 1;
insert into ss select 2;
update ss set a = 1 from JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
'$')) jt where jt.a = 2;
DROP TABLE IF EXISTS target;
CREATE TABLE target (tid integer, balance integer) WITH
(autovacuum_enabled=off);
INSERT INTO target VALUES (1, 10),(2, 20),(3, 30);
MERGE INTO target USING JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
'$' ) ERROR ON ERROR) source(sid)
ON target.tid = source.sid
WHEN MATCHED THEN UPDATE SET balance = 0
returning *;
--

+  
+   To split the row pattern into columns, json_table
+   provides the COLUMNS clause that defines the
+   schema of the created view. For each column, a separate path expression
+   can be specified to be evaluated against the row pattern to get a
+   SQL/JSON value that will become the value for the specified column in
+   a given output row.
+  
should be "an SQL/JSON".

+
+ Inserts a SQL/JSON value obtained by applying
+ path_expression against the row pattern into
+ the view's output row after coercing it to specified
+ type.
+
should be "an SQL/JSON".

"coercing it to specified type"
should be
"coercing it to the specified type"?
---
+
+ The value corresponds to whether evaluating the PATH
+ expression yields any SQL/JSON values.
+
maybe we can change to
+
+ The value corresponds to whether applying the
path_expression
+ expression yields any SQL/JSON values.
+
so it looks more consistent with the preceding paragraph.

+
+ Optionally, ON ERROR can be used to specify whether
+ to throw an error or return the specified value to handle structural
+ errors, respectively.  The default is to return a boolean value
+ FALSE.
+
we don't need "respectively" here?

+ if (jt->on_error &&
+ jt->on_error->btype != JSON_BEHAVIOR_ERROR &&
+ jt->on_error->btype != JSON_BEHAVIOR_EMPTY &&
+ jt->on_error->btype != JSON_BEHAVIOR_EMPTY_ARRAY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid ON ERROR behavior"),
+ errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
+ parser_errposition(pstate, jt->on_error->location));

errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
maybe change to something like:
`
errdetail("Only EMPTY or ERROR is allowed for ON ERROR in the
top-level JSON_TABLE() ").
`
i guess mentioning "top-level" is fine.
since "top-level", we have 19  appearances in functions-json.html.




Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Jelte Fennema-Nio
On Wed, 3 Apr 2024 at 16:31, Tom Lane  wrote:
> If we do the latter, we will almost certainly get pushback from
> distros who check for library ABI breaks.  I fear the comment
> suggesting that we could remove it someday is too optimistic.

Alright, changed patch 0002 to keep the variant. But remove it from
the recently added switch/case. And also updated the comment to remove
the "for awhile".
From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:19:04 +0200
Subject: [PATCH v12 1/2] Fix actually reachable pg_unreachable call

In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was
introduced that was actually reachable, because there were break
statements in the loop. This addresses that by both changing the break
statements to return and removing the call to pg_unreachable, which
seemed of dubious necessity anyway.
---
 src/bin/psql/command.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c005624e9c3..dc3cc7c10b7 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn)
 		 * user has requested a cancellation.
 		 */
 		if (cancel_pressed)
-			break;
+			return;
 
 		/*
 		 * Do not assume that the socket remains the same across
@@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn)
 		 */
 		sock = PQsocket(conn);
 		if (sock == -1)
-			break;
+			return;
 
 		/*
 		 * If the user sends SIGINT between the cancel_pressed check, and
@@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn)
 pg_unreachable();
 		}
 	}
-
-	pg_unreachable();
 }
 
 void

base-commit: 936e3fa3787a51397280c1081587586e83c20399
-- 
2.34.1

From f71634361a0c9ba3416c75fc3ff6d55536f134c4 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:21:52 +0200
Subject: [PATCH v12 2/2] Remove PGRES_POLLING_ACTIVE case

This enum variant has been unused for at least 21 years
44aba280207740d0956160c0288e61f28f024a71. It's been there only for
backwards compatibility of the ABI, so no need to check for it.

Also update the comment on PGRES_POLLING_ACTIVE, since we're stuck with
it forever due to ABI compatibility guarantees.
---
 src/bin/psql/command.c  | 2 --
 src/interfaces/libpq/libpq-fe.h | 3 +--
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index dc3cc7c10b7..9a163cf22cd 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3811,8 +3811,6 @@ wait_until_connected(PGconn *conn)
 			case PGRES_POLLING_WRITING:
 forRead = false;
 continue;
-			case PGRES_POLLING_ACTIVE:
-pg_unreachable();
 		}
 	}
 }
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 8d3c5c6f662..c184e853889 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -91,8 +91,7 @@ typedef enum
 	PGRES_POLLING_READING,		/* These two indicate that one may	  */
 	PGRES_POLLING_WRITING,		/* use select before polling again.   */
 	PGRES_POLLING_OK,
-	PGRES_POLLING_ACTIVE		/* unused; keep for awhile for backwards
- * compatibility */
+	PGRES_POLLING_ACTIVE		/* unused; keep for backwards compatibility */
 } PostgresPollingStatusType;
 
 typedef enum
-- 
2.34.1



Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Tristan Partin

On Wed Apr 3, 2024 at 9:50 AM CDT, Jelte Fennema-Nio wrote:

On Wed, 3 Apr 2024 at 16:31, Tom Lane  wrote:
> If we do the latter, we will almost certainly get pushback from
> distros who check for library ABI breaks.  I fear the comment
> suggesting that we could remove it someday is too optimistic.

Alright, changed patch 0002 to keep the variant. But remove it from
the recently added switch/case. And also updated the comment to remove
the "for awhile".


Removing from the switch statement causes a warning:


[1/2] Compiling C object src/bin/psql/psql.p/command.c.o
../src/bin/psql/command.c: In function ‘wait_until_connected’:
../src/bin/psql/command.c:3803:17: warning: enumeration value 
‘PGRES_POLLING_ACTIVE’ not handled in switch [-Wswitch]
 3803 | switch (PQconnectPoll(conn))
  | ^~


--
Tristan Partin
Neon (https://neon.tech)




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
 wrote:
>
> Just one comment on v32-0001:
>
> +# Synced slot on the standby must get its own inactive_since.
> +is( $standby1->safe_psql(
> +   'postgres',
> +   "SELECT '$inactive_since_on_primary'::timestamptz <= 
> '$inactive_since_on_standby'::timestamptz AND
> +   '$inactive_since_on_standby'::timestamptz <= 
> '$slot_sync_time'::timestamptz;"
> +   ),
> +   "t",
> +   'synchronized slot has got its own inactive_since');
> +
>
> By using <= we are not testing that it must get its own inactive_since (as we
> allow them to be equal in the test). I think we should just add some usleep()
> where appropriate and deny equality during the tests on inactive_since.

Thanks. It looks like we can ignore the equality in all of the
inactive_since comparisons. IIUC, all the TAP tests do run with
primary and standbys on the single BF animals. And, it looks like
assigning the inactive_since timestamps to perl variables is giving
the microseconds precision level
(./tmp_check/log/regress_log_040_standby_failover_slots_sync:inactive_since
2024-04-03 14:30:09.691648+00). FWIW, we already have some TAP and SQL
tests relying on stats_reset timestamps without equality. So, I've
left the equality for the inactive_since tests.

> Except for the above, v32-0001 LGTM.

Thanks. Please see the attached v33-0001 patch after removing equality
on inactive_since TAP tests.

On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot
 wrote:
>
> Some comments regarding v31-0002:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots don't get invalidated due to inactive timeout because
> > such slots not considered active at all as they don't perform logical
> > decoding (of course, they will perform in fast_forward mode to fix the
> > other data loss issue, but they don't generate changes for them to be
> > called as *active* slots)
>
> It behaves as described. OTOH non synced logical slots on the standby and
> physical slots on the standby are invalidated which is what is expected.

Right.

> T2 ===
>
> In case the slot is invalidated on the primary,
>
> primary:
>
> postgres=# select slot_name, inactive_since, invalidation_reason from 
> pg_replication_slots where slot_name = 's1';
>  slot_name |inactive_since | invalidation_reason
> ---+---+-
>  s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout
>
> then on the standby we get:
>
> standby:
>
> postgres=# select slot_name, inactive_since, invalidation_reason from 
> pg_replication_slots where slot_name = 's1';
>  slot_name |inactive_since| invalidation_reason
> ---+--+-
>  s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout
>
> shouldn't the slot be dropped/recreated instead of updating inactive_since?

The sync slots that are invalidated on the primary aren't dropped and
recreated on the standby. There's no point in doing so because
invalidated slots on the primary can't be made useful. However, I
found that the synced slot is acquired and released unnecessarily
after the invalidation_reason is synced from the primary. I added a
skip check in synchronize_one_slot to skip acquiring and releasing the
slot if it's locally found inactive. With this, inactive_since won't
get updated for invalidated sync slots on the standby as we don't
acquire and release the slot.

> === code
>
> CR1 ===
>
> +Invalidates replication slots that are inactive for longer the
> +specified amount of time
>
> s/for longer the/for longer that/?

Fixed.

> CR2 ===
>
> +true) as such synced slots don't actually perform
> +logical decoding.
>
> We're switching in fast forward logical due to [1], so I'm not sure that's 
> 100%
> accurate here. I'm not sure we need to specify a reason.

Fixed.

> CR3 ===
>
> + errdetail("This slot has been invalidated because it was inactive for more 
> than the time specified by replication_slot_inactive_timeout parameter.")));
>
> I think we can remove "parameter" (see for example the error message in
> validate_remote_info()) and reduce it a bit, something like?
>
> "This slot has been invalidated because it was inactive for more than 
> replication_slot_inactive_timeout"?

Done.

> CR4 ===
>
> + appendStringInfoString(&err_detail, _("The slot has been inactive for more 
> than the time specified by replication_slot_inactive_timeout parameter."));
>
> Same.

Done. Changed it to "The slot has been inactive for more than
replication_slot_inactive_timeout."

> CR5 ===
>
> +   /*
> +* This function isn't expected to be called for inactive timeout 
> based
> +* invalidation. A separate function 
> InvalidateInactiveReplicationSlot is
> +* to be used for that.
>
> Do you think it's worth to explain why?

Hm, I just wanted to point out the actual function here. I modifie

Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Jelte Fennema-Nio
On Wed, 3 Apr 2024 at 16:55, Tristan Partin  wrote:
> Removing from the switch statement causes a warning:
>
> > [1/2] Compiling C object src/bin/psql/psql.p/command.c.o
> > ../src/bin/psql/command.c: In function ‘wait_until_connected’:
> > ../src/bin/psql/command.c:3803:17: warning: enumeration value 
> > ‘PGRES_POLLING_ACTIVE’ not handled in switch [-Wswitch]
> >  3803 | switch (PQconnectPoll(conn))
> >   | ^~


Ofcourse... fixed now
From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:19:04 +0200
Subject: [PATCH v13 1/2] Fix actually reachable pg_unreachable call

In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was
introduced that was actually reachable, because there were break
statements in the loop. This addresses that by both changing the break
statements to return and removing the call to pg_unreachable, which
seemed of dubious necessity anyway.
---
 src/bin/psql/command.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c005624e9c3..dc3cc7c10b7 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn)
 		 * user has requested a cancellation.
 		 */
 		if (cancel_pressed)
-			break;
+			return;
 
 		/*
 		 * Do not assume that the socket remains the same across
@@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn)
 		 */
 		sock = PQsocket(conn);
 		if (sock == -1)
-			break;
+			return;
 
 		/*
 		 * If the user sends SIGINT between the cancel_pressed check, and
@@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn)
 pg_unreachable();
 		}
 	}
-
-	pg_unreachable();
 }
 
 void

base-commit: 936e3fa3787a51397280c1081587586e83c20399
-- 
2.34.1

From 76cef25162b44adc20172afee47836ca765d3b5c Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 3 Apr 2024 15:21:52 +0200
Subject: [PATCH v13 2/2] Remove PGRES_POLLING_ACTIVE case

This enum variant has been unused for at least 21 years
44aba280207740d0956160c0288e61f28f024a71. It's been there only for
backwards compatibility of the ABI, so no need to check for it.

Also update the comment on PGRES_POLLING_ACTIVE, since we're stuck with
it forever due to ABI compatibility guarantees.
---
 src/bin/psql/command.c  | 7 ++-
 src/interfaces/libpq/libpq-fe.h | 3 +--
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index dc3cc7c10b7..5b31d08bf70 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3802,17 +3802,14 @@ wait_until_connected(PGconn *conn)
 
 		switch (PQconnectPoll(conn))
 		{
-			case PGRES_POLLING_OK:
-			case PGRES_POLLING_FAILED:
-return;
 			case PGRES_POLLING_READING:
 forRead = true;
 continue;
 			case PGRES_POLLING_WRITING:
 forRead = false;
 continue;
-			case PGRES_POLLING_ACTIVE:
-pg_unreachable();
+			default:
+return;
 		}
 	}
 }
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 8d3c5c6f662..c184e853889 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -91,8 +91,7 @@ typedef enum
 	PGRES_POLLING_READING,		/* These two indicate that one may	  */
 	PGRES_POLLING_WRITING,		/* use select before polling again.   */
 	PGRES_POLLING_OK,
-	PGRES_POLLING_ACTIVE		/* unused; keep for awhile for backwards
- * compatibility */
+	PGRES_POLLING_ACTIVE		/* unused; keep for backwards compatibility */
 } PostgresPollingStatusType;
 
 typedef enum
-- 
2.34.1



Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Tristan Partin

On Wed Apr 3, 2024 at 10:05 AM CDT, Jelte Fennema-Nio wrote:

On Wed, 3 Apr 2024 at 16:55, Tristan Partin  wrote:
> Removing from the switch statement causes a warning:
>
> > [1/2] Compiling C object src/bin/psql/psql.p/command.c.o
> > ../src/bin/psql/command.c: In function ‘wait_until_connected’:
> > ../src/bin/psql/command.c:3803:17: warning: enumeration value 
‘PGRES_POLLING_ACTIVE’ not handled in switch [-Wswitch]
> >  3803 | switch (PQconnectPoll(conn))
> >   | ^~


Ofcourse... fixed now


I think patch 2 makes it worse. The value in -Wswitch is that when new 
enum variants are added, the developer knows the locations to update. 
Adding a default case makes -Wswitch pointless.


Patch 1 is still good. The comment change in patch 2 is good too!

--
Tristan Partin
Neon (https://neon.tech)




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Tom Lane
Jacob Champion  writes:
> As far as I can tell, no versions of LibreSSL so far provide
> X509_get_signature_info(), so this patch is probably a bit too
> aggressive.

Another problem with cutting support is how many buildfarm members
will we lose.  I scraped recent configure logs and got the attached
results.  I count 3 machines running 1.0.1, 18 running some flavor
of 1.0.2, and 7 running various LibreSSL versions.  We could
probably retire or update the 1.0.1 installations, but the rest
would represent a heavier lift.  Notably, it seems that what macOS
is shipping is LibreSSL.

regards, tom lane

sysname|  l 
  
---+--
 alabio| configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 alimoche  | configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022
 arowana   | configure: using openssl: OpenSSL 1.0.2k-fips  26 Jan 2017
 avocet| configure: using openssl: OpenSSL 1.1.1l-fips  24 Aug 2021 
SUSE release 150400.7.60.2
 ayu   | configure: using openssl: OpenSSL 1.1.0l  10 Sep 2019
 babbler   | configure: using openssl: OpenSSL 1.1.1k  FIPS 25 Mar 2021
 basilisk  | configure: using openssl: OpenSSL 3.1.4 24 Oct 2023 (Library: 
OpenSSL 3.1.4 24 Oct 2023)
 batfish   | configure: using openssl: OpenSSL 1.0.2g  1 Mar 2016
 batta | configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 blackneck | configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022
 boa   | configure: using openssl: OpenSSL 1.0.2k-fips  26 Jan 2017
 boomslang | configure: using openssl: OpenSSL 1.1.1k  25 Mar 2021
 broadbill | configure: using openssl: OpenSSL 1.1.1k  FIPS 25 Mar 2021
 bulbul| configure: using openssl: OpenSSL 1.1.1k  FIPS 25 Mar 2021
 buri  | configure: using openssl: OpenSSL 1.0.2k-fips  26 Jan 2017
 bushmaster| configure: using openssl: OpenSSL 3.1.5 30 Jan 2024 (Library: 
OpenSSL 3.1.5 30 Jan 2024)
 butterflyfish | configure: using openssl: OpenSSL 1.0.2p-fips  14 Aug 2018
 caiman| configure: using openssl: OpenSSL 3.2.1 30 Jan 2024 (Library: 
OpenSSL 3.2.1 30 Jan 2024)
 canebrake | configure: using openssl: OpenSSL 3.1.5 30 Jan 2024 (Library: 
OpenSSL 3.1.5 30 Jan 2024)
 cascabel  | configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022 
(Library: OpenSSL 1.1.1w  11 Sep 2023)
 cavefish  | configure: using openssl: OpenSSL 1.1.1  11 Sep 2018
 chevrotain| configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022 
(Library: OpenSSL 1.1.1w  11 Sep 2023)
 chimaera  | configure: using openssl: OpenSSL 1.1.0l  10 Sep 2019
 chipmunk  | configure: using openssl: OpenSSL 1.0.1t  3 May 2016
 cisticola | configure: using openssl: OpenSSL 1.1.1g FIPS  21 Apr 2020
 clam  | configure: using openssl: OpenSSL 1.0.2k-fips  26 Jan 2017
 conchuela | configure: using openssl: LibreSSL 3.2.5
 copperhead| configure: using openssl: OpenSSL 1.1.1k  25 Mar 2021
 culpeo| configure: using openssl: OpenSSL 3.0.11 19 Sep 2023 (Library: 
OpenSSL 3.0.11 19 Sep 2023)
 cuon  | configure: using openssl: OpenSSL 1.0.2g  1 Mar 2016
 demoiselle| configure: using openssl: OpenSSL 1.1.0h-fips  27 Mar 2018
 desman| configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: 
OpenSSL 3.0.9 30 May 2023)
 dhole | configure: using openssl: OpenSSL 1.0.2k-fips  26 Jan 2017
 dikkop| configure: using openssl: OpenSSL 3.0.10 1 Aug 2023 (Library: 
OpenSSL 3.0.10 1 Aug 2023)
 elasmobranch  | configure: using openssl: OpenSSL 1.1.0h-fips  27 Mar 2018
 gokiburi  | configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 grison| configure: using openssl: OpenSSL 1.1.0l  10 Sep 2019
 grison| configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022
 guaibasaurus  | configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 gull  | configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 habu  | configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: 
OpenSSL 3.0.9 30 May 2023)
 hachi | configure: using openssl: OpenSSL 1.1.1w  11 Sep 2023
 hake  | configure: using openssl: OpenSSL 1.0.2u  20 Dec 2019
 hippopotamus  | configure: using openssl: OpenSSL 1.1.1l-fips  24 Aug 2021 
SUSE release 150400.7.60.2
 indri | configure: using openssl: OpenSSL 3.2.0 23 Nov 2023 (Library: 
OpenSSL 3.2.0 23 Nov 2023)
 jackdaw   | configure: using openssl: OpenSSL 1.1.1n  15 Mar 2022 
(Library: OpenSSL 1.1.1w  11 Sep 2023)
 jay   | configure: using openssl: OpenSSL 1.1.1l-fips  24 Aug 2021 
SUSE release 150400.7.60.2
 kingsnake | configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: 
OpenSSL 3.0.9 30 May 2023)
 krait | configure: using openssl: OpenSSL 1.1.1k  FIPS 25 Mar 2021
 lancehead | configure: using open

Re: [PATCH] Modify pg_ctl to detect presence of geek user

2024-04-03 Thread Panda Developpeur
Yeah sorry for the delay, it took me some time to understood how build,
modify and test the modification


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-04-03 Thread Alvaro Herrera
On 2024-Apr-03, Dilip Kumar wrote:

> Yeah, we missed acquiring the bank lock w.r.t. intervening pages,
> thanks for reporting.  Your fix looks correct to me.

Thanks for the quick review!  And thanks to Alexander for the report.
Pushed the fix.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Bertrand Drouvot
Hi,

On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
>  wrote:
> >
> > Just one comment on v32-0001:
> >
> > +# Synced slot on the standby must get its own inactive_since.
> > +is( $standby1->safe_psql(
> > +   'postgres',
> > +   "SELECT '$inactive_since_on_primary'::timestamptz <= 
> > '$inactive_since_on_standby'::timestamptz AND
> > +   '$inactive_since_on_standby'::timestamptz <= 
> > '$slot_sync_time'::timestamptz;"
> > +   ),
> > +   "t",
> > +   'synchronized slot has got its own inactive_since');
> > +
> >
> > By using <= we are not testing that it must get its own inactive_since (as 
> > we
> > allow them to be equal in the test). I think we should just add some 
> > usleep()
> > where appropriate and deny equality during the tests on inactive_since.
> 
> > Except for the above, v32-0001 LGTM.
> 
> Thanks. Please see the attached v33-0001 patch after removing equality
> on inactive_since TAP tests.

Thanks! v33-0001 LGTM.

> On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot
>  wrote:
> > Some comments regarding v31-0002:
> >
> > T2 ===
> >
> > In case the slot is invalidated on the primary,
> >
> > primary:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from 
> > pg_replication_slots where slot_name = 's1';
> >  slot_name |inactive_since | invalidation_reason
> > ---+---+-
> >  s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout
> >
> > then on the standby we get:
> >
> > standby:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from 
> > pg_replication_slots where slot_name = 's1';
> >  slot_name |inactive_since| invalidation_reason
> > ---+--+-
> >  s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout
> >
> > shouldn't the slot be dropped/recreated instead of updating inactive_since?
> 
> The sync slots that are invalidated on the primary aren't dropped and
> recreated on the standby.

Yeah, right (I was confused with synced slots that are invalidated locally).

> However, I
> found that the synced slot is acquired and released unnecessarily
> after the invalidation_reason is synced from the primary. I added a
> skip check in synchronize_one_slot to skip acquiring and releasing the
> slot if it's locally found inactive. With this, inactive_since won't
> get updated for invalidated sync slots on the standby as we don't
> acquire and release the slot.

CR1 ===

Yeah, I can see:

@@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid 
remote_dbid)
   " name slot \"%s\" already 
exists on the standby",
   remote_slot->name));

+   /*
+* Skip the sync if the local slot is already invalidated. We 
do this
+* beforehand to save on slot acquire and release.
+*/
+   if (slot->data.invalidated != RS_INVAL_NONE)
+   return false;

Thanks to the drop_local_obsolete_slots() call I think we are not missing the 
case
where the slot has been invalidated on the primary, invalidation reason has been
synced on the standby and later the slot is dropped/ recreated manually on the
primary (then it should be dropped/recreated on the standby too).

Also it seems we are not missing the case where a sync slot is invalidated
locally due to wal removal (it should be dropped/recreated).

> 
> > CR5 ===
> >
> > +   /*
> > +* This function isn't expected to be called for inactive timeout 
> > based
> > +* invalidation. A separate function 
> > InvalidateInactiveReplicationSlot is
> > +* to be used for that.
> >
> > Do you think it's worth to explain why?
> 
> Hm, I just wanted to point out the actual function here. I modified it
> to something like the following, if others feel we don't need that, I
> can remove it.

Sorry If I was not clear but I meant to say "Do you think it's worth to explain
why we decided to create a dedicated function"? (currently we "just" explain 
that
we created one).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Modify pg_ctl to detect presence of geek user

2024-04-03 Thread Andrey M. Borodin



> On 3 Apr 2024, at 18:17, Panda Developpeur  
> wrote:
> 
> Thank you for considering my contribution.

Looks interesting!

+   usleep(50);

Don't we need to make system 500ms faster instead? Let's change it to

+   usleep(-50);

Thanks!


Best regards, Andrey Borodin.



Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Heikki Linnakangas

On 03/04/2024 17:18, Melanie Plageman wrote:

I noticed you didn't make the comment updates I suggested in my
version 13 here [1]. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).

I applied them to your v13 and attached the diff.


Applied those changes, and committed. Thank you!


Off-list, Melanie reported that there is a small regression with the
benchmark script she posted yesterday, after all, but I'm not able to
reproduce that.


Actually, I think it was noise.


Ok, phew.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: RFC: Additional Directory for Extensions

2024-04-03 Thread Christoph Berg
Re: David E. Wheeler
> > Yes, I like the suggestion to make it require a restart, which lets the 
> > sysadmin control it and not limited to whatever the person who compiled it 
> > thought would make sense.
> 
> Here’s a revision of the Debian patch that requires a server start.

Thanks for bringing this up, I should have submitted this years ago.
(The patch is originally from September 2020.)

I designed the patch to require a superuser to set it, so it doesn't
matter very much by which mechanism it gets updated. There should be
little reason to vary it at run-time, so I'd be fine with requiring a
restart, but otoh, why restrict the superuser from reloading it if
they know what they are doing?

> However, in studying the patch, it appears that the `extension_directory` is 
> searched for *all* shared libraries, not just those being loaded for an 
> extension. Am I reading the `expand_dynamic_library_name()` function right?
> 
> If so, this seems like a good way for a bad actor to muck with things, by 
> putting an exploited libpgtypes library into the extension directory, where 
> it would be loaded in preference to the core libpgtypes library, if they 
> couldn’t exploit the original.
> 
> I’m thinking it would be better to have the dynamic library lookup for 
> extension libraries (and LOAD libraries?) separate, so that the 
> `extension_directory` would not be used for core libraries.

I'm not sure the concept of "core libraries" exists. PG happens to
dlopen things at run time, and it doesn't know/care if they were
installed by users or by the original PG server. Also, an exploited
libpgtypes library is not worse than any other untrusted "user"
library, so you really don't want to allow users to provide their own
.so files, no matter by what mechanism.

> This would also allow the lookup of extension libraries prefixed by the 
> directory field from the control file, which would enable much tidier 
> extension installation: The control file, SQL scripts, and DSOs could all be 
> in a single directory for an extension.

Nice idea, but that would mean installing .so files into PGSHAREDIR.
Perhaps the whole extension stuff would have to move to PKGLIBDIR
instead.


Fwiw, I wrote this patch to solve the problem of testing extensions at
build-time where the build process does not have write access to
PGSHAREDIR. It solves that problem quite well, almost all PG
extensions have build-time test coverage now (where there was
basically 0 before).

Security is not a concern at this point as everything is running as
the same user, and the test cluster will be wiped right after the
test. I figured marking the setting as "super user" only was enough
security at that point, but I would recommend another audit before
using it together with "trusted" extensions and other things in
production.

Christoph




Re: Is it safe to cache data by GiST consistent function

2024-04-03 Thread Michał Kłeczek
Thanks for taking your time to answer. Not sure if I understand though.

> On 3 Apr 2024, at 16:27, Tom Lane  wrote:
> 
> =?utf-8?Q?Micha=C5=82_K=C5=82eczek?=  writes:
>> When implementing a GiST consistent function I found the need to cache 
>> pre-processed query across invocations.
>> I am not sure if it is safe to do (or I need to perform some steps to make 
>> sure cached info is not leaked between rescans).
> 
> AFAIK it works.  I don't see any of the in-core ones doing so,
> but at least range_gist_consistent and multirange_gist_consistent
> are missing a bet by repeating their cache search every time.

pg_trgm consistent caches tigrams but it has some logic to make sure cached 
values are recalculated:

cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
if (cache == NULL ||
cache->strategy != strategy ||
VARSIZE(cache->query) != querysize ||
memcmp((char *) cache->query, (char *) query, querysize) != 0)

What I don’t understand is if it is necessary or it is enough to check 
fn_extra==NULL.

> 
>> The comment in gistrescan says:
> 
>>  /*
>>   * If this isn't the first time through, preserve the fn_extra
>>   * pointers, so that if the consistentFns are using them to 
>> cache
>>   * data, that data is not leaked across a rescan.
>>   */
> 
>> which seems to me self-contradictory as fn_extra is preserved between 
>> rescans (so leaks are indeed possible).
> 
> I think you're reading it wrong.  If we cleared fn_extra during
> rescan, access to the old extra value would be lost so a new one
> would have to be created, leaking the old value for the rest of
> the query.

I understand that but not sure what “that data is not leaked across a rescan” 
means.

—
Michal



Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alvaro Herrera
Hello Alexander,

On 2024-Apr-03, Alexander Korotkov wrote:

> Regarding the shmem data structure for LSN waiters.  I didn't pick
> LWLock or ConditionVariable, because I needed the ability to wake up
> only those waiters whose LSN is already replayed.  In my experience
> waking up a process is way slower than scanning a short flat array.

I agree, but I think that's unrelated to what I was saying, which is
just the patch I attach here.

> However, I agree that when the number of waiters is very high and flat
> array may become a problem.  It seems that the pairing heap is not
> hard to use for shmem structures.  The only memory allocation call in
> paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> be able to initialize the pairing heap in-place, and it will be fine
> for shmem.

Ok.

With the code as it stands today, everything in WaitLSNState apart from
the pairing heap is accessed without any locking.  I think this is at
least partly OK because each backend only accesses its own entry; but it
deserves a comment.  Or maybe something more, because WaitLSNSetLatches
does modify the entry for other backends.  (Admittedly, this could only
happens for backends that are already sleeping, and it only happens
with the lock acquired, so it's probably okay.  But clearly it deserves
a comment.)

Don't we need to WaitLSNCleanup() during error recovery or something?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)
>From 4079dc6a6a6893055b32dee75f178b324bbaef77 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 3 Apr 2024 18:35:15 +0200
Subject: [PATCH] Use an LWLock instead of a spinlock in waitlsn.c

---
 src/backend/commands/waitlsn.c  | 15 +++
 src/backend/utils/activity/wait_event_names.txt |  1 +
 src/include/commands/waitlsn.h  |  5 +
 src/include/storage/lwlocklist.h|  1 +
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 51a34d422e..a57b818a2d 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -58,7 +58,6 @@ WaitLSNShmemInit(void)
 			   &found);
 	if (!found)
 	{
-		SpinLockInit(&waitLSN->waitersHeapMutex);
 		pg_atomic_init_u64(&waitLSN->minWaitedLSN, PG_UINT64_MAX);
 		pairingheap_initialize(&waitLSN->waitersHeap, lsn_cmp, NULL);
 		memset(&waitLSN->procInfos, 0, MaxBackends * sizeof(WaitLSNProcInfo));
@@ -115,13 +114,13 @@ addLSNWaiter(XLogRecPtr lsn)
 	procInfo->procnum = MyProcNumber;
 	procInfo->waitLSN = lsn;
 
-	SpinLockAcquire(&waitLSN->waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	pairingheap_add(&waitLSN->waitersHeap, &procInfo->phNode);
 	procInfo->inHeap = true;
 	updateMinWaitedLSN();
 
-	SpinLockRelease(&waitLSN->waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 }
 
 /*
@@ -132,11 +131,11 @@ deleteLSNWaiter(void)
 {
 	WaitLSNProcInfo *procInfo = &waitLSN->procInfos[MyProcNumber];
 
-	SpinLockAcquire(&waitLSN->waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	if (!procInfo->inHeap)
 	{
-		SpinLockRelease(&waitLSN->waitersHeapMutex);
+		LWLockRelease(WaitLSNLock);
 		return;
 	}
 
@@ -144,7 +143,7 @@ deleteLSNWaiter(void)
 	procInfo->inHeap = false;
 	updateMinWaitedLSN();
 
-	SpinLockRelease(&waitLSN->waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 }
 
 /*
@@ -160,7 +159,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN)
 
 	wakeUpProcNums = palloc(sizeof(int) * MaxBackends);
 
-	SpinLockAcquire(&waitLSN->waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	/*
 	 * Iterate the pairing heap of waiting processes till we find LSN not yet
@@ -182,7 +181,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN)
 
 	updateMinWaitedLSN();
 
-	SpinLockRelease(&waitLSN->waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 
 	/*
 	 * Set latches for processes, whose waited LSNs are already replayed. This
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0d288d6b3d..ec43a78d60 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -330,6 +330,7 @@ WALSummarizer	"Waiting to read or update WAL summarization state."
 DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared pg_serial state."
+WaitLSN	"Waiting to read or update shared Wait-for-LSN state."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index 0d80248682..b3d9eed64d 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -55,13 +55,10 @@ typedef struct WaitLSNState
 
 	/*
 	 * A pairing heap of wai

Re: Detoasting optionally to make Explain-Analyze less misleading

2024-04-03 Thread Matthias van de Meent
On Tue, 2 Apr 2024 at 17:47, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > Attached is v9, which is rebased on master to handle 24eebc65's
> > changed signature of pq_sendcountedtext.
> > It now also includes autocompletion, and a second patch which adds
> > documentation to give users insights into this new addition to
> > EXPLAIN.
>
> I took a quick look through this.  Some comments in no particular
> order:

Thanks!

> Documentation is not optional, so I don't really see the point of
> splitting this into two patches.

I've seen the inverse several times, but I've merged them in the
attached version 10.

> IIUC, it's not possible to use the SERIALIZE option when explaining
> CREATE TABLE AS, because you can't install the instrumentation tuple
> receiver when the IntoRel one is needed.  I think that's fine because
> no serialization would happen in that case anyway, but should we
> throw an error if that combination is requested?  Blindly reporting
> that zero bytes were serialized seems like it'd confuse people.

I think it's easily explained as no rows were transfered to the
client. If there is actual confusion, we can document it, but
confusing disk with network is not a case I'd protect against. See
also: EXPLAIN (ANALYZE, SERIALIZE) INSERT without the RETURNING
clause.

> I'd lose the stuff about measuring memory consumption.  Nobody asked
> for that and the total is completely misleading, because in reality
> we'll reclaim the memory used after each row.  It would allow cutting
> the text-mode output down to one line, too, instead of having your
> own format that's not like anything else.

Done.

> I thought the upthread agreement was to display the amount of
> data sent rounded to kilobytes, so why is the code displaying
> an exact byte count?

Probably it was because the other explain code I referenced was using
bytes in the json/yaml format. Fixed.

> I don't especially care for magic numbers like these:
>
> +   /* see printtup.h why we add 18 bytes here. These are the 
> infos
> +* needed for each attribute plus the attribute's name */
> +   receiver->metrics.bytesSent += (int64) namelen + 1 + 18;
>
> If the protocol is ever changed in a way that invalidates this,
> there's about 0 chance that somebody would remember to touch
> this code.
> However, isn't the bottom half of serializeAnalyzeStartup doing
> exactly what the comment above it says we don't do, that is accounting
> for the RowDescription message?  Frankly I agree with the comment that
> it's not worth troubling over, so I'd just drop that code rather than
> finding a solution for the magic-number problem.

In the comment above I intended to explain that it takes negligible
time to serialize the RowDescription message (when compared to all
other tasks of explain), so skipping the actual writing of the message
would be fine.
I'm not sure I agree with not including the size of RowDescription
itself though, as wide results can have a very large RowDescription
overhead; up to several times the returned data in cases where few
rows are returned.

Either way, I've removed that part of the code.

> Don't bother with duplicating valgrind-related logic in
> serializeAnalyzeReceive --- that's irrelevant to actual users.

Removed. I've instead added buffer usage, as I realised that wasn't
covered yet, and quite important to detect excessive detoasting (it's
not covered at the top-level scan).

> This seems like cowboy coding:
>
> +   self->destRecevier.mydest = DestNone;
>
> You should define a new value of the CommandDest enum and
> integrate this receiver type into the support functions
> in dest.c.

Done.

> BTW, "destRecevier" is misspelled...

Thanks, fixed.


Kind regards,

Matthias van de Meent.


v10-0001-Explain-Add-SERIALIZE-option.patch
Description: Binary data


Re: LogwrtResult contended spinlock

2024-04-03 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera  wrote:
>
> Thanks for keeping this moving forward.  I gave your proposed patches a
> look.   One thing I didn't like much is that we're adding a new member
> (Copy) to XLogwrtAtomic -- but this struct is supposed to be a mirror of
> XLogwrtResult for use with atomic access.  Since this new member is not
> added to XLogwrtResult (because it's not needed there), the whole idea
> of there being symmetry between those two structs crumbles down.
> Because we later stop using struct-assign anyway, meaning we no longer
> need the structs to match, we can instead spell out the members in
> XLogCtl and call it a day.

Hm, I have no objection to having separate variables in XLogCtl.

> So what I do in the attached 0001 is stop using the XLogwrtResult struct
> in XLogCtl and replace it with separate Write and Flush values, and add
> the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> and Flush from the shared XLogCtl to the local variable given as macro
> argument.  (I also added our idiomatic do {} while(0) to the macro
> definition, for safety).  The new members are XLogCtl->logWriteResult
> and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> essentially identical semantics as the previous code.  No atomic access
> yet!

+1.

> 0002 then adds pg_atomic_monotonic_advance_u64.  (I don't add the _u32
> variant, because I don't think it's a great idea to add dead code.  If
> later we see a need for it we can put it in.)  It also changes the two
> new members to be atomics, changes the macro to use atomic read, and
> XLogWrite now uses monotonic increment.  A couple of other places can
> move the macro calls to occur outside the spinlock.  Also, XLogWrite
> gains the invariant checking that involves Write and Flush.

I'm fine with not having the 32 bit variant of
pg_atomic_monotonic_advance. However, a recent commit bd5132db5 added
both 32 and 64 bit versions of pg_atomic_read_membarrier even though
32 bit isn't being used.

> Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and
> Flush, and updates WALReadFromBuffers to test that instead of the Write
> pointer, and adds in XLogWrite the invariant checks that involve the
> Copy pointer.

+1.

The attached patches look good to me.

Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to
RefreshXLogWriteResult().

> I haven't rerun Bharath test loop yet; will do so shortly.

I ran it a 100 times [1] on top of all the 3 patches, it looks fine.

[1] for i in {1..100}; do make check
PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

./configure --prefix=$PWD/pg17/ --enable-debug --enable-tap-tests
--enable-cassert CC=/usr/bin/clang-14 > install.log && make -j 8
install > install.log 2>&1 &

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Is it safe to cache data by GiST consistent function

2024-04-03 Thread Tom Lane
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?=  writes:
> On 3 Apr 2024, at 16:27, Tom Lane  wrote:
>> AFAIK it works.  I don't see any of the in-core ones doing so,
>> but at least range_gist_consistent and multirange_gist_consistent
>> are missing a bet by repeating their cache search every time.

> pg_trgm consistent caches tigrams but it has some logic to make sure cached 
> values are recalculated:

>   cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
>   if (cache == NULL ||
>   cache->strategy != strategy ||
>   VARSIZE(cache->query) != querysize ||
>   memcmp((char *) cache->query, (char *) query, querysize) != 0)

> What I don’t understand is if it is necessary or it is enough to check 
> fn_extra==NULL.

Ah, I didn't think to search contrib.  Yes, you need to validate the
cache entry.  In this example, a rescan could insert a new query
value.  In general, an opclass support function could get called using
a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache
entry), so it's unwise to assume that cached data is relevant to the
current call without checking.

regards, tom lane




Re: Streaming read-ready sequential scan code

2024-04-03 Thread Melanie Plageman
On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas  wrote:
>
> On 01/04/2024 22:58, Melanie Plageman wrote:
> > Attached v7 has version 14 of the streaming read API as well as a few
> > small tweaks to comments and code.
>
> I saw benchmarks in this thread to show that there's no regression when
> the data is in cache, but I didn't see any benchmarks demonstrating the
> benefit of this. So I ran this quick test:

Good point! It would be good to show why we would actually want this
patch. Attached v8 is rebased over current master (which now has the
streaming read API).

On the topic of BAS_BULKREAD buffer access strategy, I think the least
we could do is add an assert like this to read_stream_begin_relation()
after calculating max_pinned_buffers.

Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers);

Perhaps we should do more? I think with a ring size of 16 MB, large
SELECTs are safe for now. But I know future developers will make
changes and it would be good not to assume they will understand that
pinning more buffers than the size of the ring effectively invalidates
the ring.

- Melanie
From cfccafec650a77c53b1d78180b52db31742181ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 27 Mar 2024 20:25:06 -0400
Subject: [PATCH v8 3/3] Sequential scans and TID range scans stream reads

Implementing streaming read support for heap sequential scans and TID
range scans includes three parts:

Allocate the read stream object in heap_beginscan(). On rescan, reset
the stream by releasing all pinned buffers and resetting the prefetch
block.

Implement a callback returning the next block to prefetch to the
read stream infrastructure.

Invoke the read stream API when a new page is needed. When the scan
direction changes, reset the stream.
---
 src/backend/access/heap/heapam.c | 94 
 src/include/access/heapam.h  | 15 +
 2 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6b26f5bf8af..3546f637c13 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -221,6 +221,25 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] =
  * 
  */
 
+static BlockNumber
+heap_scan_stream_read_next(ReadStream *pgsr, void *private_data,
+		   void *per_buffer_data)
+{
+	HeapScanDesc scan = (HeapScanDesc) private_data;
+
+	if (unlikely(!scan->rs_inited))
+	{
+		scan->rs_prefetch_block = heapgettup_initial_block(scan, scan->rs_dir);
+		scan->rs_inited = true;
+	}
+	else
+		scan->rs_prefetch_block = heapgettup_advance_block(scan,
+		   scan->rs_prefetch_block,
+		   scan->rs_dir);
+
+	return scan->rs_prefetch_block;
+}
+
 /* 
  *		initscan - scan code common to heap_beginscan and heap_rescan
  * 
@@ -323,6 +342,13 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	scan->rs_cbuf = InvalidBuffer;
 	scan->rs_cblock = InvalidBlockNumber;
 
+	/*
+	 * Initialize to ForwardScanDirection because it is most common and heap
+	 * scans usually must go forwards before going backward.
+	 */
+	scan->rs_dir = ForwardScanDirection;
+	scan->rs_prefetch_block = InvalidBlockNumber;
+
 	/* page-at-a-time fields are always invalid when not rs_inited */
 
 	/*
@@ -459,12 +485,14 @@ heapbuildvis(TableScanDesc sscan)
 /*
  * heapfetchbuf - subroutine for heapgettup()
  *
- * This routine reads the next block of the relation into a buffer and returns
- * with that pinned buffer saved in the scan descriptor.
+ * This routine gets gets the next block of the relation from the read stream
+ * and saves that pinned buffer in the scan descriptor.
  */
 static inline void
 heapfetchbuf(HeapScanDesc scan, ScanDirection dir)
 {
+	Assert(scan->rs_read_stream);
+
 	/* release previous scan buffer, if any */
 	if (BufferIsValid(scan->rs_cbuf))
 	{
@@ -479,19 +507,23 @@ heapfetchbuf(HeapScanDesc scan, ScanDirection dir)
 	 */
 	CHECK_FOR_INTERRUPTS();
 
-	if (unlikely(!scan->rs_inited))
+	/*
+	 * If the scan direction is changing, reset the prefetch block to the
+	 * current block. Otherwise, we will incorrectly prefetch the blocks
+	 * between the prefetch block and the current block again before
+	 * prefetching blocks in the new, correct scan direction.
+	 */
+	if (unlikely(scan->rs_dir != dir))
 	{
-		scan->rs_cblock = heapgettup_initial_block(scan, dir);
-		Assert(scan->rs_cblock != InvalidBlockNumber || !BufferIsValid(scan->rs_cbuf));
-		scan->rs_inited = true;
+		scan->rs_prefetch_block = scan->rs_cblock;
+		read_stream_reset(scan->rs_read_stream);
 	}
-	else
-		scan->rs_cblock = heapgettup_advance_block(scan, scan->rs_cblock, dir);
 
-	/* read block if valid */
-	if (BlockNumberIsValid(scan->rs_cblock))
-		scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM,
-		   scan->rs_cblock, RBM_NORMAL, scan->rs_strategy);
+	s

Re: RFC: Additional Directory for Extensions

2024-04-03 Thread Christoph Berg
> Fwiw, I wrote this patch to solve the problem of testing extensions at
> build-time where the build process does not have write access to
> PGSHAREDIR. It solves that problem quite well, almost all PG
> extensions have build-time test coverage now (where there was
> basically 0 before).

Also, it's called extension "destdir" because it behaves like DESTDIR
in Makefiles: It prepends the given path to the path that PG is trying
to open when set. So it doesn't allow arbitrary new locations as of
now, just /home/build/foo-1/debian/foo/usr/share/postgresql/17/extension
in addition to /usr/share/postgresql/17/extension. (That is what the
Debian package build process needs, so that restriction/design choice
made sense.)

> Security is not a concern at this point as everything is running as
> the same user, and the test cluster will be wiped right after the
> test. I figured marking the setting as "super user" only was enough
> security at that point, but I would recommend another audit before
> using it together with "trusted" extensions and other things in
> production.

That's also included in the current GUC description:

   This directory is prepended to paths when loading extensions
   (control and SQL files), and to the '$libdir' directive when
   loading modules that back functions. The location is made
   configurable to allow build-time testing of extensions that do not
   have been installed to their proper location yet.

Perhaps I should have included a more verbose "NOT FOR PRODUCTION"
there.

As for compatibility, the patch has been part of the PG 9.5..17 now
for several years, and I'm very happy with extra test coverage it
provides, especially on the Debian architectures that don't have
"autopkgtest" runners yet.

Christoph




Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Melanie Plageman
On Wed, Apr 3, 2024 at 12:34 PM Heikki Linnakangas  wrote:
>
> On 03/04/2024 17:18, Melanie Plageman wrote:
> > I noticed you didn't make the comment updates I suggested in my
> > version 13 here [1]. A few of them are outdated references to
> > heap_page_prune() and one to a now deleted variable name
> > (all_visible_except_removable).
> >
> > I applied them to your v13 and attached the diff.
>
> Applied those changes, and committed. Thank you!

Thanks! And thanks for updating the commitfest entry!

- Melanie




Re: Adding comments to help understand psql hidden queries

2024-04-03 Thread David Christensen
I got Greg's blessing on squashing the commits down, and now including
a v4 with additional improvements on the output formatting front.
Main changes:

- all generated comments are the same width
- width has been bumped to 80
- removed _() functions for consumers of the new output functions

This patch adds two new helper functions, OutputComment() and
OutputCommentStars() to output and wrap the comments to the
appropriate widths.  Everything should continue to work just fine if
we want to adjust the width by just adjusting the MAX_COMMENT_WIDTH
symbol.

I removed _() in the output of the query/stars since there'd be no
sensible existing translations for the constructed string, which
included the query string itself.  If we need it for the "QUERY"
string, this could be added fairly easily, but the existing piece
would have been nonsensical and never used in practice.

Thanks,

David


v4-0001-Improve-SQL-comments-on-echo-hidden-output.patch
Description: Binary data


Re: Use streaming read API in ANALYZE

2024-04-03 Thread Heikki Linnakangas

On 03/04/2024 13:31, Nazir Bilal Yavuz wrote:

Streaming API has been committed but the committed version has a minor
change, the read_stream_begin_relation function takes Relation instead
of BufferManagerRelation now. So, here is a v5 which addresses this
change.


I'm getting a repeatable segfault / assertion failure with this:

postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10);
CREATE TABLE
postgres=# insert into tengiga select g, repeat('x', 900) from 
generate_series(1, 140) g;

INSERT 0 140
postgres=# set default_statistics_target = 10; ANALYZE tengiga;
SET
ANALYZE
postgres=# set default_statistics_target = 100; ANALYZE tengiga;
SET
ANALYZE
postgres=# set default_statistics_target =1000; ANALYZE tengiga;
SET
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File: 
"heapam_handler.c", Line: 1079, PID: 262232
postgres: heikki postgres [local] 
ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8]
postgres: heikki postgres [local] 
ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34]

postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34]
postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a]
postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9]
postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0]
postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe]
postgres: heikki postgres [local] 
ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9]
postgres: heikki postgres [local] 
ANALYZE(ProcessUtility+0x136)[0x564889f0afb1]

postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8]
postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b]
postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015]
postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6]
postgres: heikki postgres [local] 
ANALYZE(PostgresMain+0x80c)[0x564889f06fd7]

postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876]
postgres: heikki postgres [local] 
ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3]

postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e]
postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0]
postgres: heikki postgres [local] 
ANALYZE(PostmasterMain+0x152b)[0x564889e2214d]

postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305]
postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61]
2024-04-03 20:15:49.157 EEST [262101] LOG:  server process (PID 262232) 
was terminated by signal 6: Aborted


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Statistics Import and Export

2024-04-03 Thread Tom Lane
Corey Huinker  writes:
> - functions strive to not ERROR unless absolutely necessary. The biggest
> exposure is the call to array_in().

As far as that goes, it shouldn't be that hard to deal with, at least
not for "soft" errors which hopefully cover most input-function
failures these days.  You should be invoking array_in via
InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext.
(Look at pg_input_error_info() for useful precedent.)

There might be something to be said for handling all the error
cases via an ErrorSaveContext and use of ereturn() instead of
ereport().  Not sure if it's worth the trouble or not.

regards, tom lane




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Jacob Champion
On Wed, Apr 3, 2024 at 8:29 AM Tom Lane  wrote:
> I count 3 machines running 1.0.1, 18 running some flavor
> of 1.0.2, and 7 running various LibreSSL versions.

I don't know all the tradeoffs with buildfarm wrangling, but IMO all
those 1.0.2 installations are the most problematic, so I dug in a bit:

  arowana CentOS 7
  batfish Ubuntu 16.04.3
  boa RHEL 7
  buri CentOS 7
  butterflyfish Photon 2.0
  clam RHEL 7.1
  cuon Ubuntu 16.04
  dhole CentOS 7.4
  hake OpenIndiana hipster
  mantid CentOS 7.9
  margay Solaris 11.4.42
  massasauga Amazon Linux 2
  myna Photon 3.0
  parula Amazon Linux 2
  rhinoceros CentOS 7.1
  shelduck SUSE 12SP5
  siskin RHEL 7.9
  snakefly Amazon Linux 2

The RHEL7-alikes are the biggest set, but that's already discussed
above. Looks like SUSE 12 goes EOL later this year (October 2024), and
it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu
16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March
2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of
which appear to have newer versions of OpenSSL shipped and selectable.

--Jacob




Re: Improve eviction algorithm in ReorderBuffer

2024-04-03 Thread Jeff Davis
On Wed, 2024-04-03 at 01:45 -0700, Jeff Davis wrote:
> I suggest that you add a "heap_index" field to ReorderBufferTXN that
> would point to the index into the heap's array (the same as
> bh_nodeidx_entry.index in your patch). Each time an element moves
> within the heap array, just follow the pointer to the
> ReorderBufferTXN
> object and update the heap_index -- no hash lookup required.

It looks like my email was slightly too late, as the work was already
committed.

My suggestion is not required for 17, and so it's fine if this waits
until the next CF. If it turns out to be a win we can consider
backporting to 17 just to keep the code consistent, otherwise it can go
in 18.

Regards,
Jeff Davis





Re: Psql meta-command conninfo+

2024-04-03 Thread Imseih (AWS), Sami
Building the docs fail for v26. The error is:



ref/psql-ref.sgml:1042: element member: validity error : Element term is not 
declared in member list of possible children

 

  ^



I am able to build up to v24 before the  was replaced with 




I tested building with a modified structure as below; the  is a 

that has a  within it.  Each field is a simple list member and

the the name of the fields should be .



  

   \conninfo[+]

   

   

Outputs a string displaying information about the current

database connection. When + is appended,

more details about the connection are displayed in table

format:

   

   Database:The database name of the 
connection.

   Authenticated User:The authenticated database 
user of the connection.

   Current User:The user name of the current 
execution context;

  see the current_user() function in

   for more 
details.

   

   

   

  



Regards,



Sami


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Tom Lane
Jacob Champion  writes:
> The RHEL7-alikes are the biggest set, but that's already discussed
> above. Looks like SUSE 12 goes EOL later this year (October 2024), and
> it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu
> 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March
> 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of
> which appear to have newer versions of OpenSSL shipped and selectable.

The discussion we had last year concluded that we were OK with
dropping 1.0.1 support when RHEL6 goes out of extended support
(June 2024 per this thread, I didn't check it).  Seems like we
should have the same policy for RHEL7.  Also, calling Photon 3
dead because it went EOL three days ago seems over-hasty.

Bottom line for me is that pulling 1.0.1 support now is OK,
but I think pulling 1.0.2 is premature.

regards, tom lane




Re: Popcount optimization using AVX512

2024-04-03 Thread Nathan Bossart
I committed v23-0001.  Here is a rebased version of the remaining patches.
I intend to test the masking idea from Ants next.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 295b03530de5f42fe876b4489191da2f8dc83194 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 27 Mar 2024 16:39:24 -0500
Subject: [PATCH v24 1/2] AVX512 popcount support

---
 config/c-compiler.m4 |  58 ++
 configure| 252 +++
 configure.ac |  51 ++
 meson.build  |  87 +
 src/Makefile.global.in   |   5 +
 src/include/pg_config.h.in   |  12 ++
 src/include/port/pg_bitutils.h   |  15 ++
 src/makefiles/meson.build|   4 +-
 src/port/Makefile|  11 ++
 src/port/meson.build |   6 +-
 src/port/pg_bitutils.c   |  29 ++-
 src/port/pg_popcount_avx512.c|  49 ++
 src/port/pg_popcount_avx512_choose.c |  71 
 src/test/regress/expected/bit.out|  24 +++
 src/test/regress/sql/bit.sql |   4 +
 15 files changed, 673 insertions(+), 5 deletions(-)
 create mode 100644 src/port/pg_popcount_avx512.c
 create mode 100644 src/port/pg_popcount_avx512_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3268a780bb..5fb60775ca 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then
 fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+
+# PGAC_XSAVE_INTRINSICS
+# -
+# Check if the compiler supports the XSAVE instructions using the _xgetbv
+# intrinsic function.
+#
+# An optional compiler flag can be passed as argument (e.g., -mxsave).  If the
+# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE.
+AC_DEFUN([PGAC_XSAVE_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [return _xgetbv(0) & 0xe0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_XSAVE="$1"
+  pgac_xsave_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_XSAVE_INTRINSICS
+
+# PGAC_AVX512_POPCNT_INTRINSICS
+# -
+# Check if the compiler supports the AVX512 POPCNT instructions using the
+# _mm512_setzero_si512, _mm512_loadu_si512, _mm512_popcnt_epi64,
+# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions.
+#
+# An optional compiler flag can be passed as argument
+# (e.g., -mavx512vpopcntdq).  If the intrinsics are supported, sets
+# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT.
+AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [const char buf@<:@sizeof(__m512i)@:>@;
+   PG_INT64_TYPE popcnt = 0;
+   __m512i accum = _mm512_setzero_si512();
+   const __m512i val = _mm512_loadu_si512((const __m512i *) buf);
+   const __m512i cnt = _mm512_popcnt_epi64(val);
+   accum = _mm512_add_epi64(accum, cnt);
+   popcnt = _mm512_reduce_add_epi64(accum);
+   /* return computed value, to prevent the above being optimized away */
+   return popcnt == 0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_POPCNT="$1"
+  pgac_avx512_popcnt_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_AVX512_POPCNT_INTRINSICS
diff --git a/configure b/configure
index 36feeafbb2..b48ed7f271 100755
--- a/configure
+++ b/configure
@@ -647,6 +647,9 @@ MSGFMT_FLAGS
 MSGFMT
 PG_CRC32C_OBJS
 CFLAGS_CRC
+PG_POPCNT_OBJS
+CFLAGS_POPCNT
+CFLAGS_XSAVE
 LIBOBJS
 OPENSSL
 ZSTD
@@ -17404,6 +17407,40 @@ $as_echo "#define HAVE__GET_CPUID 1" >>confdefs.h
 
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __get_cpuid_count" >&5
+$as_echo_n "checking for __get_cpuid_count... " >&6; }
+if ${pgac_cv__get_cpuid_count+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+unsigned int exx[4] = {0, 0, 0, 0};
+  __get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]);
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv__get_cpuid_count="yes"
+else
+  pgac_cv__get_cpuid_count="no"
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__get_cpuid_count" >&5
+$as_echo "$pgac_cv__get_cpuid_count" >&6; }
+if test x"$pgac_cv__get_cpuid_count" = x"yes"; then
+
+$as_echo "#define 

Re: Can't compile PG 17 (master) from git under Msys2 autoconf

2024-04-03 Thread Alvaro Herrera
Hi Regina,

On 2024-Mar-27, Regina Obe wrote:

> The error is 
> 
> rm -f '../../src/include/storage/lwlocknames.h'
> cp -pR ../../backend/storage/lmgr/lwlocknames.h
> '../../src/include/storage/lwlocknames.h'
> cp: cannot stat '../../backend/storage/lmgr/lwlocknames.h': No such file or
> directory
> make[1]: *** [Makefile:143: ../../src/include/storage/lwlocknames.h] Error 1
> make[1]: Leaving directory '/projects/postgresql/postgresql-git/src/backend'
> make: *** [../../src/Makefile.global:382: submake-generated-headers] Error 2

Hmm, I changed these rules again in commit da952b415f44, maybe your
problem is with that one?  I wonder if changing the references to
../include/storage/lwlocklist.h to
$(topdir)/src/include/storage/lwlocklist.h (and similar things in
src/backend/storage/lmgr/Makefile) would fix it.

Do you run configure in the source directory or a separate build one?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Jacob Champion
On Wed, Apr 3, 2024 at 10:38 AM Tom Lane  wrote:
> Also, calling Photon 3
> dead because it went EOL three days ago seems over-hasty.

Well, March 1, but either way I thought "dead" for the purposes of
this thread meant "you can't build the very latest version of Postgres
on it", not "we've forgotten it exists". Back branches will continue
to need support and testing.

> Bottom line for me is that pulling 1.0.1 support now is OK,
> but I think pulling 1.0.2 is premature.

Okay, but IIUC, waiting for it to drop out of extended support means
we deal with it for four more years. That seems excessive.

--Jacob




Re: Security lessons from liblzma - libsystemd

2024-04-03 Thread Andres Freund
Hi,

As most will know by now, the way xz debacle was able to make sshd vulnerable
was through a dependency from sshd to libsystemd and then from libsystemd to
liblzma. One lesson from this is that unnecessary dependencies can still
increase risk.

It's worth noting that we have an optional dependency on libsystemd as well.

Openssh has now integrated [1] a patch to remove the dependency on libsystemd
for triggering service manager readyness notifications, by inlining the
necessary function. That's not hard, the protocol is pretty simple.

I suspect we should do the same. We're not even close to being a target as
attractive as openssh, but still, it seems unnecessary.

Intro into the protocol is at [2], with real content and outline of the
relevant code at [3].


An argument could be made to instead just remove support, but I think it's
quite valuable to have intra service dependencies that can rely on the server
actually having started up.

Greetings,

Andres Freund

[1] https://bugzilla.mindrot.org/show_bug.cgi?id=2641
[2] 
https://www.freedesktop.org/software/systemd/man/devel/systemd.html#Readiness%20Protocol
[3] https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes




Re: LogwrtResult contended spinlock

2024-04-03 Thread Alvaro Herrera
On 2024-Apr-03, Bharath Rupireddy wrote:

> On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera  wrote:

> > So what I do in the attached 0001 is stop using the XLogwrtResult struct
> > in XLogCtl and replace it with separate Write and Flush values, and add
> > the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> > and Flush from the shared XLogCtl to the local variable given as macro
> > argument.  (I also added our idiomatic do {} while(0) to the macro
> > definition, for safety).  The new members are XLogCtl->logWriteResult
> > and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> > essentially identical semantics as the previous code.  No atomic access
> > yet!
> 
> +1.

> Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to
> RefreshXLogWriteResult().

Okay, I have pushed 0001 with the name change, will see about getting
the others in tomorrow.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Security lessons from liblzma

2024-04-03 Thread Peter Eisentraut

On 30.03.24 00:14, Daniel Gustafsson wrote:

One take-away for me is how important it is to ship recipes for regenerating
any testdata which is included in generated/compiled/binary format.  Kind of
how we in our tree ship the config for test TLS certificates and keys which can
be manually inspected, and used to rebuild the testdata (although the risk for
injections in this particular case seems low).  Bad things can still be
injected, but formats which allow manual review at least goes some way towards
lowering risk.


I actually find the situation with the test TLS files quite 
unsatisfactory.  While we have build rules, the output files are not 
reproducible, both because of some inherent randomness in the 
generation, and because different OpenSSL versions produce different 
details.  So you just have to "trust" that what's there now makes sense. 
 Of course, you can use openssl tools to unpack these files, but that 
is difficult and unreliable unless you know exactly what you are looking 
for.  Also, for example, do we even know whether all the files that are 
there now are even used by any tests?


A few years ago, some guy on the internet sent in a purported update to 
one of the files [0], which I ended up committing, but I remember that 
that situation gave me quite some pause at the time.


It would be better if we created the required test files as part of the 
test run.  (Why not?  Too slow?)  Alternatively, I have been thinking 
that maybe we could make the output more reproducible by messing with 
whatever random seed OpenSSL uses.  Or maybe use a Python library to 
create the files.  Some things to think about.


[0]: 
https://www.postgresql.org/message-id/FEF81714-D479-4512-839B-C769D2605F8A%40yesql.se






Re: Statistics Import and Export

2024-04-03 Thread Corey Huinker
>
>
> As far as that goes, it shouldn't be that hard to deal with, at least
> not for "soft" errors which hopefully cover most input-function
> failures these days.  You should be invoking array_in via
> InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext.
> (Look at pg_input_error_info() for useful precedent.)
>

Ah, my understanding may be out of date. I was under the impression that
that mechanism relied on the the cooperation of the per-element input
function, so even if we got all the builtin datatypes to play nice with
*Safe(), we were always going to be at risk with a user-defined input
function.


> There might be something to be said for handling all the error
> cases via an ErrorSaveContext and use of ereturn() instead of
> ereport().  Not sure if it's worth the trouble or not.
>

It would help us tailor the user experience. Right now we have several
endgames. To recap:

1. NULL input =>  Return NULL. (because strict).
2. Actual error (permissions, cache lookup not found, etc) => Raise ERROR
(thus ruining binary upgrade)
3. Call values are so bad (examples: attname not found, required stat
missing) that nothing can recover => WARN, return FALSE.
4. At least one stakind-stat is wonky (impossible for datatype, missing
stat pair, wrong type on input parameter), but that's the worst of it => 1
to N WARNs, write stats that do make sense, return TRUE.
5. Hunky-dory. => No warns. Write all stats. return TRUE.

Which of those seem like good ereturn candidates to you?


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Tom Lane
Jacob Champion  writes:
> On Wed, Apr 3, 2024 at 10:38 AM Tom Lane  wrote:
>> Bottom line for me is that pulling 1.0.1 support now is OK,
>> but I think pulling 1.0.2 is premature.

> Okay, but IIUC, waiting for it to drop out of extended support means
> we deal with it for four more years. That seems excessive.

wikipedia says that RHEL7 ends ELS as of June 2026 [1].

regards, tom lane

[1] https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Product_life_cycle




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alexander Korotkov
On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera  wrote:
>
> On 2024-Apr-03, Alexander Korotkov wrote:
>
> > Regarding the shmem data structure for LSN waiters.  I didn't pick
> > LWLock or ConditionVariable, because I needed the ability to wake up
> > only those waiters whose LSN is already replayed.  In my experience
> > waking up a process is way slower than scanning a short flat array.
>
> I agree, but I think that's unrelated to what I was saying, which is
> just the patch I attach here.

Oh, sorry for the confusion.  I'd re-read your message.  Indeed you
meant this very clearly!

I'm good with the patch.  Attached revision contains a bit of a commit message.

> > However, I agree that when the number of waiters is very high and flat
> > array may become a problem.  It seems that the pairing heap is not
> > hard to use for shmem structures.  The only memory allocation call in
> > paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> > be able to initialize the pairing heap in-place, and it will be fine
> > for shmem.
>
> Ok.
>
> With the code as it stands today, everything in WaitLSNState apart from
> the pairing heap is accessed without any locking.  I think this is at
> least partly OK because each backend only accesses its own entry; but it
> deserves a comment.  Or maybe something more, because WaitLSNSetLatches
> does modify the entry for other backends.  (Admittedly, this could only
> happens for backends that are already sleeping, and it only happens
> with the lock acquired, so it's probably okay.  But clearly it deserves
> a comment.)

Please, check 0002 patch attached.  I found it easier to move two
assignments we previously moved out of lock, into the lock; then claim
WaitLSNState.procInfos is also protected by WaitLSNLock.

> Don't we need to WaitLSNCleanup() during error recovery or something?

Yes, there is WaitLSNCleanup(). It's currently only called from one
place, given that waiting for LSN can't be invoked from background
workers or inside the transaction.

--
Regards,
Alexander Korotkov




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Jacob Champion
On Wed, Apr 3, 2024 at 11:13 AM Tom Lane  wrote:
> wikipedia says that RHEL7 ends ELS as of June 2026 [1].

I may have misunderstood something in here then:


https://www.redhat.com/en/blog/announcing-4-years-extended-life-cycle-support-els-red-hat-enterprise-linux-7

> ELS for RHEL 7 is now available for 4 years, starting on July 1, 2024.

Am I missing something?

--Jacob




Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-04-03 Thread Andres Freund
Hi,

On 2024-02-14 16:23:38 +0900, Michael Paquier wrote:
> It is possible to retrieve this information some WITH RECURSIVE as well, as
> mentioned upthread.  Perhaps we could consider documenting these tricks?

I think it's sufficiently hard that it's not a reasonable way to do this.

Greetings,

Andres Freund




Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Peter Geoghegan
On Wed, Apr 3, 2024 at 1:04 PM Melanie Plageman
 wrote:
> Thanks! And thanks for updating the commitfest entry!

Nice work!

-- 
Peter Geoghegan




Re: Use streaming read API in ANALYZE

2024-04-03 Thread Nazir Bilal Yavuz
Hi Jakub,

Thank you for looking into this and doing a performance analysis.

On Wed, 3 Apr 2024 at 11:42, Jakub Wartak  wrote:
>
> On Tue, Apr 2, 2024 at 9:24 AM Nazir Bilal Yavuz  wrote:
> [..]
> > v4 is rebased on top of v14 streaming read API changes.
>
> Hi Nazir, so with streaming API committed, I gave a try to this patch.
> With autovacuum=off and 30GB table on NVMe (with standard readahead of
> 256kb and ext4, Debian 12, kernel 6.1.0, shared_buffers = 128MB
> default) created using: create table t as select repeat('a', 100) || i
> || repeat('b', 500) as filler from generate_series(1, 4500) as i;
>
> on master, effect of mainteance_io_concurency [default 10] is like
> that (when resetting the fs cache after each ANALYZE):
>
> m_io_c = 0:
> Time: 3137.914 ms (00:03.138)
> Time: 3094.540 ms (00:03.095)
> Time: 3452.513 ms (00:03.453)
>
> m_io_c = 1:
> Time: 2972.751 ms (00:02.973)
> Time: 2939.551 ms (00:02.940)
> Time: 2904.428 ms (00:02.904)
>
> m_io_c = 2:
> Time: 1580.260 ms (00:01.580)
> Time: 1572.132 ms (00:01.572)
> Time: 1558.334 ms (00:01.558)
>
> m_io_c = 4:
> Time: 938.304 ms
> Time: 931.772 ms
> Time: 920.044 ms
>
> m_io_c = 8:
> Time: 666.025 ms
> Time: 660.241 ms
> Time: 648.848 ms
>
> m_io_c = 16:
> Time: 542.450 ms
> Time: 561.155 ms
> Time: 539.683 ms
>
> m_io_c = 32:
> Time: 538.487 ms
> Time: 541.705 ms
> Time: 538.101 ms
>
> with patch applied:
>
> m_io_c = 0:
> Time: 3106.469 ms (00:03.106)
> Time: 3140.343 ms (00:03.140)
> Time: 3044.133 ms (00:03.044)
>
> m_io_c = 1:
> Time: 2959.817 ms (00:02.960)
> Time: 2920.265 ms (00:02.920)
> Time: 2911.745 ms (00:02.912)
>
> m_io_c = 2:
> Time: 1581.912 ms (00:01.582)
> Time: 1561.444 ms (00:01.561)
> Time: 1558.251 ms (00:01.558)
>
> m_io_c = 4:
> Time: 908.116 ms
> Time: 901.245 ms
> Time: 901.071 ms
>
> m_io_c = 8:
> Time: 619.870 ms
> Time: 620.327 ms
> Time: 614.266 ms
>
> m_io_c = 16:
> Time: 529.885 ms
> Time: 526.958 ms
> Time: 528.474 ms
>
> m_io_c = 32:
> Time: 521.185 ms
> Time: 520.713 ms
> Time: 517.729 ms
>
> No difference to me, which seems to be good. I've double checked and
> patch used the new way
>
> acquire_sample_rows -> heapam_scan_analyze_next_block ->
> ReadBufferExtended -> ReadBuffer_common (inlined) -> WaitReadBuffers
> -> mdreadv -> FileReadV -> pg_preadv (inlined)
> acquire_sample_rows -> heapam_scan_analyze_next_block ->
> ReadBufferExtended -> ReadBuffer_common (inlined) -> StartReadBuffer
> -> ...
>
> I gave also io_combine_limit to 32 (max, 256kB) a try and got those
> slightly better results:
>
> [..]
> m_io_c = 16:
> Time: 494.599 ms
> Time: 496.345 ms
> Time: 973.500 ms
>
> m_io_c = 32:
> Time: 461.031 ms
> Time: 449.037 ms
> Time: 443.375 ms
>
> and that (last one) apparently was able to push it to ~50-60k still
> random IOPS range, the rareq-sz was still ~8 (9.9) kB as analyze was
> still reading random , so I assume no merging was done:
>
> Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz
> w/s wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s
> drqm/s  %drqm d_await dareq-sz f/s f_await  aqu-sz  %util
> nvme0n1   61212.00591.82 0.00   0.000.10 9.90
> 2.00  0.02 0.00   0.000.0012.000.00  0.00
> 0.00   0.000.00 0.000.000.006.28  85.20
>
> So in short it looks good to me.

My results are similar to yours, also I realized a bug while working
on your benchmarking cases. I will share the cause and the fix soon.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Pavel Borisov
Hi, Alexander!

On Wed, 3 Apr 2024 at 22:18, Alexander Korotkov 
wrote:

> On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera 
> wrote:
> >
> > On 2024-Apr-03, Alexander Korotkov wrote:
> >
> > > Regarding the shmem data structure for LSN waiters.  I didn't pick
> > > LWLock or ConditionVariable, because I needed the ability to wake up
> > > only those waiters whose LSN is already replayed.  In my experience
> > > waking up a process is way slower than scanning a short flat array.
> >
> > I agree, but I think that's unrelated to what I was saying, which is
> > just the patch I attach here.
>
> Oh, sorry for the confusion.  I'd re-read your message.  Indeed you
> meant this very clearly!
>
> I'm good with the patch.  Attached revision contains a bit of a commit
> message.
>
> > > However, I agree that when the number of waiters is very high and flat
> > > array may become a problem.  It seems that the pairing heap is not
> > > hard to use for shmem structures.  The only memory allocation call in
> > > paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> > > be able to initialize the pairing heap in-place, and it will be fine
> > > for shmem.
> >
> > Ok.
> >
> > With the code as it stands today, everything in WaitLSNState apart from
> > the pairing heap is accessed without any locking.  I think this is at
> > least partly OK because each backend only accesses its own entry; but it
> > deserves a comment.  Or maybe something more, because WaitLSNSetLatches
> > does modify the entry for other backends.  (Admittedly, this could only
> > happens for backends that are already sleeping, and it only happens
> > with the lock acquired, so it's probably okay.  But clearly it deserves
> > a comment.)
>
> Please, check 0002 patch attached.  I found it easier to move two
> assignments we previously moved out of lock, into the lock; then claim
> WaitLSNState.procInfos is also protected by WaitLSNLock.
>
Could you re-attach 0002. Seems it failed to attach to the previous
message.

Regards,
Pavel


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 19:38, Tom Lane  wrote:
> 
> Jacob Champion  writes:
>> The RHEL7-alikes are the biggest set, but that's already discussed
>> above. Looks like SUSE 12 goes EOL later this year (October 2024), and
>> it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu
>> 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March
>> 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of
>> which appear to have newer versions of OpenSSL shipped and selectable.
> 
> The discussion we had last year concluded that we were OK with
> dropping 1.0.1 support when RHEL6 goes out of extended support
> (June 2024 per this thread, I didn't check it).  Seems like we
> should have the same policy for RHEL7.  Also, calling Photon 3
> dead because it went EOL three days ago seems over-hasty.
> 
> Bottom line for me is that pulling 1.0.1 support now is OK,
> but I think pulling 1.0.2 is premature.

Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers?  If
not then it seems mostly academical to tie our dependencies to RHEL ELS unless
I'm missing something.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 17:29, Tom Lane  wrote:
> 
> Jacob Champion  writes:
>> As far as I can tell, no versions of LibreSSL so far provide
>> X509_get_signature_info(), so this patch is probably a bit too
>> aggressive.
> 
> Another problem with cutting support is how many buildfarm members
> will we lose.  I scraped recent configure logs and got the attached
> results.  I count 3 machines running 1.0.1,

Support for 1.0.1 was removed with 8e278b657664 in July 2023 so those are not
building with OpenSSL enabled already.

> 18 running some flavor of 1.0.2,

massasauga and snakefly run the ssl_passphrase_callback-check test but none of
these run the ssl-check tests AFAICT, so we have very low coverage as is.  The
fact that very few animals run the ssl tests is a pet peeve of mine, it would
be nice if we could get broader coverage there.

Worth noting is that the OpenSSL check in configure.ac only reports what the
version of the OpenSSL binary in $PATH is, not which version of the library
that we build against (using --with-libs/--with-includes etc).

--
Daniel Gustafsson





Re: On disable_cost

2024-04-03 Thread Robert Haas
On Tue, Apr 2, 2024 at 12:58 PM Tom Lane  wrote:
> Not really.  But if you had, say, a join of a promoted path to a
> disabled path, that would be treated as on-par with a join of two
> regular paths, which seems like it'd lead to odd choices.  Maybe
> it'd be fine, but my gut says it'd likely not act nicely.  As you
> say, it's a lot easier to believe that only-promoted or only-disabled
> situations would behave sanely.

Makes sense.

I wanted to further explore the idea of just not generating plans of
types that are currently disabled. I looked into doing this for
enable_indexscan and enable_indexonlyscan. As a first step, I
investigated how those settings work now, and was horrified. I don't
know whether I just wasn't paying attention back when the original
index-only scan work was done -- I remember discussing
enable_indexonlyscan with you at the time -- or whether it got changed
subsequently. Anyway, the current behavior is:

[A] enable_indexscan=false adds disable_cost to the cost of every
Index Scan path *and also* every Index-Only Scan path. So disabling
index-scans also in effect discourages the use of index-only scans,
which would make sense if we didn't have a separate setting called
enable_indexonlyscan, but we do. Given that, I think this is
completely and utterly wrong.

[b] enable_indexonlyscan=false causes index-only scan paths not to be
generated at all, but instead, we generate index-scan paths to do the
same thing that we would not have generated otherwise. This is weird
because it means that disabling one plan type causes us to consider
additional plans of another type, which seems like a thing that a user
might not expect. It's more defensible than [A], though, because you
could argue that we only omit the index scan path as an optimization,
on the theory that it will always lose to the index-only scan path,
and thus if the index-only scan path is not generated, there's a point
to generating the index scan path after all, so we should. However, it
seems unlikely to me that someone reading the one line of
documentation that we have about this parameter would be able to guess
that it works this way.

Here's an example of how the current system behaves:

robert.haas=# explain select count(*) from pgbench_accounts;
   QUERY PLAN
-
 Aggregate  (cost=2854.29..2854.30 rows=1 width=8)
   ->  Index Only Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.29..2604.29 rows=10 width=0)
(2 rows)

robert.haas=# set enable_indexscan=false;
SET
robert.haas=# explain select count(*) from pgbench_accounts;
  QUERY PLAN
--
 Aggregate  (cost=2890.00..2890.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts  (cost=0.00..2640.00 rows=10 width=0)
(2 rows)

robert.haas=# set enable_seqscan=false;
SET
robert.haas=# set enable_bitmapscan=false;
SET
robert.haas=# explain select count(*) from pgbench_accounts;
QUERY PLAN
--
 Aggregate  (cost=1002854.29..1002854.30 rows=1 width=8)
   ->  Index Only Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=100.29..1002604.29 rows=10 width=0)
(2 rows)

robert.haas=# set enable_indexonlyscan=false;
SET
robert.haas=# explain select count(*) from pgbench_accounts;
  QUERY PLAN
---
 Aggregate  (cost=1002890.00..1002890.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts
(cost=100.00..1002640.00 rows=10 width=0)
(2 rows)

The first time we run the query, it picks an index-only scan because
it's the cheapest. When index scans are disabled, the query now picks
a sequential scan, even though it wasn't use an index-only scan,
because the index scan that it was using is perceived to have become
very expensive. When we then shut off sequential scans and bitmap-only
scans, it switches back to an index-only scan, because setting
enable_indexscan=false didn't completely disable index-only scans, but
just made them expensive. But now everything looks expensive, so we go
back to the same plan we had initially, except with the cost increased
by a bazillion. Finally, when we disable index-only scans, that
removes that plan from the pool, so now we pick the second-cheapest
plan overall, which in this case is a sequential scan.

So just to see what would happen, I wrote a patch to make
enable_indexscan and enable_indexonlyscan do exactly what they say on
the tin: when you set one of them to false, paths of that type are not
generated, a

Re: Use streaming read API in ANALYZE

2024-04-03 Thread Nazir Bilal Yavuz
Hi,

Thank you for looking into this!

On Wed, 3 Apr 2024 at 20:17, Heikki Linnakangas  wrote:
>
> On 03/04/2024 13:31, Nazir Bilal Yavuz wrote:
> > Streaming API has been committed but the committed version has a minor
> > change, the read_stream_begin_relation function takes Relation instead
> > of BufferManagerRelation now. So, here is a v5 which addresses this
> > change.
>
> I'm getting a repeatable segfault / assertion failure with this:
>
> postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10);
> CREATE TABLE
> postgres=# insert into tengiga select g, repeat('x', 900) from
> generate_series(1, 140) g;
> INSERT 0 140
> postgres=# set default_statistics_target = 10; ANALYZE tengiga;
> SET
> ANALYZE
> postgres=# set default_statistics_target = 100; ANALYZE tengiga;
> SET
> ANALYZE
> postgres=# set default_statistics_target =1000; ANALYZE tengiga;
> SET
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File:
> "heapam_handler.c", Line: 1079, PID: 262232
> postgres: heikki postgres [local]
> ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8]
> postgres: heikki postgres [local]
> ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34]
> postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34]
> postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a]
> postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9]
> postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0]
> postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe]
> postgres: heikki postgres [local]
> ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9]
> postgres: heikki postgres [local]
> ANALYZE(ProcessUtility+0x136)[0x564889f0afb1]
> postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8]
> postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b]
> postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015]
> postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6]
> postgres: heikki postgres [local]
> ANALYZE(PostgresMain+0x80c)[0x564889f06fd7]
> postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876]
> postgres: heikki postgres [local]
> ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3]
> postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e]
> postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0]
> postgres: heikki postgres [local]
> ANALYZE(PostmasterMain+0x152b)[0x564889e2214d]
> postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305]
> postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61]
> 2024-04-03 20:15:49.157 EEST [262101] LOG:  server process (PID 262232)
> was terminated by signal 6: Aborted

I realized the same error while working on Jakub's benchmarking results.

Cause: I was using the nblocks variable to check how many blocks will
be returned from the streaming API. But I realized that sometimes the
number returned from BlockSampler_Init() is not equal to the number of
blocks that BlockSampler_Next() will return as BlockSampling algorithm
decides how many blocks to return on the fly by using some random
seeds.

There are a couple of solutions I thought of:

1- Use BlockSampler_HasMore() instead of nblocks in the main loop in
the acquire_sample_rows():

Streaming API uses this function to prefetch block numbers.
BlockSampler_HasMore() will reach to the end first as it is used while
prefetching, so it will start to return false while there are still
buffers to return from the streaming API. That will cause some buffers
at the end to not be processed.

2- Expose something (function, variable etc.) from the streaming API
to understand if the read is finished and there is no buffer to
return:

I think this works but I am not sure if the streaming API allows
something like that.

3- Check every buffer returned from the streaming API, if it is
invalid stop the main loop in the acquire_sample_rows():

This solves the problem but there will be two if checks for each
buffer returned,
- in heapam_scan_analyze_next_block() to check if the returned buffer is invalid
- to break main loop in acquire_sample_rows() if
heapam_scan_analyze_next_block() returns false
One of the if cases can be bypassed by moving
heapam_scan_analyze_next_block()'s code to the main loop in the
acquire_sample_rows().

I implemented the third solution, here is v6.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Wed, 3 Apr 2024 15:14:15 +0300
Subject: [PATCH v6] Use streaming read API in ANALYZE

ANALYZE command gets random tuples using BlockS

Re: Security lessons from liblzma

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 20:09, Peter Eisentraut  wrote:
> 
> On 30.03.24 00:14, Daniel Gustafsson wrote:
>> One take-away for me is how important it is to ship recipes for regenerating
>> any testdata which is included in generated/compiled/binary format.  Kind of
>> how we in our tree ship the config for test TLS certificates and keys which 
>> can
>> be manually inspected, and used to rebuild the testdata (although the risk 
>> for
>> injections in this particular case seems low).  Bad things can still be
>> injected, but formats which allow manual review at least goes some way 
>> towards
>> lowering risk.
> 
> I actually find the situation with the test TLS files quite unsatisfactory.  
> While we have build rules, the output files are not reproducible, both 
> because of some inherent randomness in the generation, and because different 
> OpenSSL versions produce different details.

This testdata is by nature not reproducible, and twisting arms to make it so
will only result in testing against synthetic data which risk hiding bugs IMO.

> So you just have to "trust" that what's there now makes sense.

Not entirely, you can review the input files which are used to generate the
test data and verify that those make sense..

> Of course, you can use openssl tools to unpack these files, but that is 
> difficult and unreliable unless you know exactly what you are looking for.

..and check like you mention above, but it's as you say not entirely trivial.  
It
would be nice to improve this but I'm not sure how.  Document how to inspect
the data in src/test/ssl/README perhaps?

> It would be better if we created the required test files as part of the test 
> run.  (Why not?  Too slow?)

The make sslfiles step requires OpenSSL 1.1.1, which is higher than what we
require to be installed to build postgres.  The higher-than-base requirement is
due to it building test data only used when running 1.1.1 or higher, so
technically it could be made to work if anyone is interesting in investing time
in 1.0.2.

Time is one aspect, on my crusty old laptop it takes ~2 seconds to regenerate
the files.  That in itself isn't that much, but we've rejected test-time
additions far less than that.  We could however make CI and the Buildfarm run
the regeneration and leave it up to each developer to decide locally?  Or
remove them and regenerate on the first SSL test run and then use the cached
ones after that?

On top of time I have a feeling the regeneration won't run on Windows.  When
it's converted to use Meson then that might be fixed though.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Andrew Dunstan


On 2024-04-03 We 15:12, Daniel Gustafsson wrote:

The
fact that very few animals run the ssl tests is a pet peeve of mine, it would
be nice if we could get broader coverage there.



Well, the only reason for that is that the SSL tests need to be listed 
in PG_TEST_EXTRA, and the only reason for that is that there's a 
possible hazard on multi-user servers. But I bet precious few buildfarm 
animals run in such an environment. Mine don't - I'm the only user.


Maybe we could send out an email to the buildfarm-owners list asking 
people to consider allowing the ssl tests.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: psql not responding to SIGINT upon db reconnection

2024-04-03 Thread Robert Haas
On Wed, Apr 3, 2024 at 11:17 AM Tristan Partin  wrote:
> I think patch 2 makes it worse. The value in -Wswitch is that when new
> enum variants are added, the developer knows the locations to update.
> Adding a default case makes -Wswitch pointless.
>
> Patch 1 is still good. The comment change in patch 2 is good too!

It seems to me that 0001 should either remove the pg_unreachable()
call or change the break to a return, but not both. The commit message
tries to justify doing both by saying that the pg_unreachable() call
doesn't have much value, but there's not much value in omitting
pg_unreachable() from unreachable places, either, so I'm not
convinced.

I agree with Tristan's analysis of 0002.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Statistics Import and Export

2024-04-03 Thread Tom Lane
Corey Huinker  writes:
>> As far as that goes, it shouldn't be that hard to deal with, at least
>> not for "soft" errors which hopefully cover most input-function
>> failures these days.  You should be invoking array_in via
>> InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext.
>> (Look at pg_input_error_info() for useful precedent.)

> Ah, my understanding may be out of date. I was under the impression that
> that mechanism relied on the the cooperation of the per-element input
> function, so even if we got all the builtin datatypes to play nice with
> *Safe(), we were always going to be at risk with a user-defined input
> function.

That's correct, but it's silly not to do what we can.  Also, I imagine
that there is going to be high evolutionary pressure on UDTs to
support soft error mode for COPY, so over time the problem will
decrease --- as long as we invoke the soft error mode.

> 1. NULL input =>  Return NULL. (because strict).
> 2. Actual error (permissions, cache lookup not found, etc) => Raise ERROR
> (thus ruining binary upgrade)
> 3. Call values are so bad (examples: attname not found, required stat
> missing) that nothing can recover => WARN, return FALSE.
> 4. At least one stakind-stat is wonky (impossible for datatype, missing
> stat pair, wrong type on input parameter), but that's the worst of it => 1
> to N WARNs, write stats that do make sense, return TRUE.
> 5. Hunky-dory. => No warns. Write all stats. return TRUE.

> Which of those seem like good ereturn candidates to you?

I'm good with all those behaviors.  On reflection, the design I was
vaguely imagining wouldn't cope with case 4 (multiple WARNs per call)
so never mind that.

regards, tom lane




Re: On disable_cost

2024-04-03 Thread Greg Sabino Mullane
On Wed, Apr 3, 2024 at 3:21 PM Robert Haas  wrote:

> It's also pretty clear to me that the fact that enable_indexscan
> and enable_indexonlyscan work completely differently from each other
> is surprising at best, wrong at worst, but here again, what this patch
> does about that is not above reproach.


Yes, that is wrong, surely there is a reason we have two vars. Thanks for
digging into this: if nothing else, the code will be better for this
discussion, even if we do nothing for now with disable_cost.

Cheers,
Greg


Re: Opportunistically pruning page before update

2024-04-03 Thread Melanie Plageman
On Fri, Jan 26, 2024 at 8:33 PM James Coleman  wrote:
>
> On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar  wrote:
> >
> > On Tue, Jan 23, 2024 at 7:18 AM James Coleman  wrote:
> > >
> > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman  wrote:
> > > >
> > > > See rebased patch attached.
> > >
> > > I just realized I left a change in during the rebase that wasn't 
> > > necessary.
> > >
> > > v4 attached.
> >
> > I have noticed that you are performing the opportunistic pruning after
> > we decided that the updated tuple can not fit in the current page and
> > then we are performing the pruning on the new target page.  Why don't
> > we first perform the pruning on the existing page of the tuple itself?
> >  Or this is already being done before this patch? I could not find
> > such existing pruning so got this question because such pruning can
> > convert many non-hot updates to the HOT update right?
>
> First off I noticed that I accidentally sent a different version of
> the patch I'd originally worked on. Here's the one from the proper
> branch. It's still similar, but I want to make sure the right one is
> being reviewed.

I finally got back around to looking at this. Sorry for the delay.

I don't feel confident enough to say at a high level whether or not it
is a good idea in the general case to try pruning every block
RelationGetBufferForTuple() considers as a target block.

But, I did have a few thoughts on the implementation:

heap_page_prune_opt() checks PageGetHeapFreeSpace() twice. You will
have already done that in RelationGetBufferForTuple(). And you don't
need even need to do it both of those times because you have a lock
(which is why heap_page_prune_opt() does it twice). This all seems a
bit wasteful. And then, you do it again after pruning.

This made me think, vacuum cares how much space heap_page_prune() (now
heap_page_prune_and_freeze()) freed up. Now if we add another caller
who cares how much space pruning freed up, perhaps it is worth
calculating this while pruning and returning it. I know
PageGetHeapFreeSpace() isn't the most expensive function in the world,
but it also seems like a useful, relevant piece of information to
inform the caller of.

You don't have to implement the above, it was just something I was
thinking about.

Looking at the code also made me wonder if it is worth changing
RelationGetBufferForTuple() to call PageGetHeapFreeSpace() before
taking a lock (which won't give a totally accurate result, but that's
probably okay) and then call heap_page_prune_opt() without a lock when
PageGetHeapFreeSpace() says there isn't enough space.

Also do we want to do GetVisibilityMapPins() before calling
heap_page_prune_opt()? I don't quite get why we do that before knowing
if we are going to actually use the target block in the current code.

Anyway, I'm not sure I like just adding a parameter to
heap_page_prune_opt() to indicate it already has an exclusive lock. It
does a bunch of stuff that was always done without a lock and now you
are doing it with an exclusive lock.

- Melanie




  1   2   >