date:20240816

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Jelte Fennema-Nio

On Fri, 16 Aug 2024 at 00:39, Jacob Champion
 wrote:
>
> On Thu, Aug 15, 2024 at 3:04 PM Heikki Linnakangas  wrote:
> > Perhaps we should even change it to return
> > 30 for protocol version 3.0, and just leave a note in the docs like
> > "in older versions of libpq, this returned 3 for protocol version 3.0".
>
> I think that would absolutely break current code. It's not uncommon
> (IME) for hand-built clients wrapping libpq to make sure they're not
> talking v2 before turning on some feature, and they're allowed to do
> that with a PQprotocolVersion() == 3 check. A GitHub code search
> brings up examples.

Can you give a link for that code search and/or an example where
someone used it like that in a real setting? The only example I could
find where someone used it at all was psycopg having a unittest for
their python wrapper around this API, and they indeed used == 3.

> As for 30001: I don't see the value in modifying an exported API in
> this way. Especially since we can add a new entry point that will be
> guaranteed not to break anyone, as Robert suggested. I think it's a
> POLA violation at minimum; my understanding was that up until this
> point, the value was incremented during major (incompatible) version
> bumps. And I think other users will have had the same understanding.

The advantage is not introducing yet another API when we already have
one with a great name that no-one is currently using. The current API
is in practice just a very convoluted way of writing 3. Also doing an
== 3 check is obviously problematic, if people use this function they
should be using > 3 to be compatible with future versions. So if we
ever introduce protocol version 4, then these (afaict theoretical)
users would break anyway.

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread Peter Eisentraut


On 15.08.24 19:25, Matthias van de Meent wrote:
Apart from the above issue, I'm -0.5 on what to me equates with 
automated spam to -hackers: the volume of mails would put this around 
the 16th most common sender on -hackers, with about 400 mails/year 
(based on 80 new patches for next CF, and 5 CFs/year, combined with 
Robert's 2023 statistics at [0]).


Yeah, I'd rather not open the can of worms that we send automated emails 
to this list at all.  If we do this, then there will be other requests, 
and why this one and not that one.  If people want to get emails from 
the commitfest app, it should be that you subscribe there and it sends 
those emails to those who want them.


I also don't quite like the suggested contents of such mail: (1) and (2) 
are essentially duplicative information, and because CF's entries' IDs 
are not shown in the app the "with ID " part of (1) is practically 
useless (better use the CFE's title), (3) would best be stored and/or 
integrated in the CFA, as would (4). Additionally, (4) isn't 
canonical/guaranteed to be up-to-date, see above. As for the 
"copy-pastable git commands" suggestion, I'm not sure that's applicable, 
for the same reasons that (4) won't work reliably. CFBot's repo to me 
seems more like an internal implementation detail of CFBot than an 
authorative source of patchset diffs.


I agree.  And this also smells a bit like "my favorite workflow".  Maybe 
start with a blog post or a wiki page if you want to suggest this.

Re: Fix memory counter update in reorderbuffer

2024-08-16 Thread Shlok Kyal

On Wed, 7 Aug 2024 at 11:48, Amit Kapila  wrote:
>
> On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada  wrote:
> >
> > On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila  wrote:
> > >
> > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > I found a bug in the memory counter update in reorderbuffer. It was
> > > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected.
> > > >
> > > > In ReorderBufferCleanupTXN() we zero the transaction size and then
> > > > free the transaction entry as follows:
> > > >
> > > > /* Update the memory counter */
> > > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
> > > >
> > > > /* deallocate */
> > > > ReorderBufferReturnTXN(rb, txn);
> > > >
> > >
> > > Why do we need to zero the transaction size explicitly? Shouldn't it
> > > automatically become zero after freeing all the changes?
> >
> > It will become zero after freeing all the changes. However, since
> > updating the max-heap when freeing each change could bring some
> > overhead, we freed the changes without updating the memory counter,
> > and then zerod it.
> >
>
> I think this should be covered in comments as it is not apparent.
>
> >
> > > BTW, commit 5bec1d6bc5e also introduced
> > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which
> > > is also worth considering while fixing the reported problem. It may
> > > not have the same problem as you have reported but we can consider
> > > whether setting txn size as zero explicitly is required or not.
> >
> > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the
> > same as I mentioned above. And yes, as you mentioned, it doesn't have
> > the same problem that I reported here.
> >
>
> I checked again and found that ReorderBufferResetTXN() first calls
> ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After
> that, it also tries to free spec_insert change which should have the
> same problem. So, what saves this path from the same problem?

I tried testing this scenario and I was able to reproduce the crash in
HEAD with this scenario. I have created a patch for the testcase.
I also tested the same scenario with the latest patch shared by
Sawada-san in [1]. And confirm that this resolves the issue.

[1]: 
https://www.postgresql.org/message-id/CAD21AoDHC4Sob%3DNEYTxgu5wd4rzCpSLY_hWapMUqf4WKrAxmyw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal


fix_memory_counter_update_in_reorderbuffer_v2.patch
Description: Binary data


test.patch
Description: Binary data

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-16 Thread Yugo Nagata

On Thu, 15 Aug 2024 13:58:03 +0300
Aleksander Alekseev  wrote:

> Hi,
> 
> > Perhaps we should also add casts between bytea and the integer/numeric
> > types. That might be easier to use than functions in some
> > circumstances.
> >
> > When casting a numeric to an integer, the result is rounded to the
> > nearest integer, and NaN/Inf generate errors, so we should probably do
> > the same here.
> 
> Yes, I was also thinking about adding NUMERIC versions of get_bytes()
> / set_bytes(). This would allow converting more than 8 bytes to/from
> an integer. I dropped this idea because I thought there would be not
> much practical use for it. On the flip side you never know who uses
> Postgres and for what purpose.
> 
> I will add corresponding casts unless the idea will get a push-back
> from the community. IMO the existence of these casts will at least not
> make things worse.

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is, 
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved. 

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+   while(size)
+   {

I wonder inserting a space after "while" is the standard style.

Regards,
Yugo Nagata

-- 
Yugo Nagata

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-16 Thread Aleksander Alekseev

Hi,

> When we add such casts between bytea and the integer/numeric types,
> one of the problems mentioned the first of the thread, that is,
> "we don't have a convenient way of casting a bytea to an integer / bigint
> and vice versa", would seem be resolved.
>
> On the other hand, I suppose get_bytes() and set_bytes() are still useful
> for extracting bytes from byteas, etc. If casting is no longer the main
> purpose of these functions, are variations that get_bytes returns bytea
> instead of bigint, and set_bytes receives bytea as the newvalue argument
> useful? I wonder it would eliminate the restrict that size cannot be larger
> than 8.

No, casting between bytea and numeric will not replace get_bytes() /
set_bytes() for performance reasons.

Consider the case when you want to extract an int4 from a bytea.
get_bytes() is going to be very fast while substr() -> ::numeric ->
::integer chain will require unnecessary copying and conversions.
Casting between bytea and numeric is only useful when one has to deal
with integers larger than 8 bytes. Whether this happens often is a
debatable question.

> Here are my very trivial comments on the patch.
>
> + * this routine treats "bytea" as an array of bytes.
>
> Maybe, the sentence should start with "This ... ".
>
> +   while(size)
> +   {
>
> I wonder inserting a space after "while" is the standard style.

Thanks, fixed.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Add-get_bytes-and-set_bytes-functions.patch
Description: Binary data

[PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2024-08-16 Thread Nishant Sharma

Hi,


--
Actual column names used while creation of foreign table are not allowed to
be an
empty string, but when we use column_name as an empty string in OPTIONS
during
CREATE or ALTER of foreign tables, it is allowed.

*EXAMPLES:-*
1) CREATE FOREIGN TABLE test_fdw(*"" *VARCHAR(15), id VARCHAR(5)) SERVER
localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
ERROR:  zero-length delimited identifier at or near 
LINE 1: CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5))...

2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) *OPTIONS* *(column_name
'')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public',
table_name 'test');
CREATE FOREIGN TABLE

postgres@43832=#\d test_fdw
  Foreign table "public.test_fdw"
 Column | Type  | Collation | Nullable | Default |   FDW
options
+---+---+--+-+--
 name   | character varying(15) |   |  | |
*(column_name
'')*
 id | character varying(5)  |   |  | |
Server: localhost_fdw
FDW options: (schema_name 'public', table_name 'test')

--

Due to the above, when we try to simply select a remote table, the remote
query uses
the empty column name from the FDW column option and the select fails.

*EXAMPLES:-*
1) select * from test_fdw;
ERROR:  zero-length delimited identifier at or near 
CONTEXT:  remote SQL command: SELECT "", id FROM public.test

2) explain verbose select * from test_fdw;
QUERY PLAN
--
 Foreign Scan on public.test_fdw  (cost=100.00..297.66 rows=853 width=72)
   Output: name, id
   Remote SQL: SELECT "", id FROM public.test
(3 rows)

--

We can fix this issue either during fetching of FDW column option names
while
building remote query or we do not allow at CREATE or ALTER of foreign
tables itself.
We think it would be better to disallow adding the column_name option as
empty in
CREATE or ALTER itself as we do not allow empty actual column names for a
foreign
table. Unless I missed to understand the purpose of allowing column_name as
empty.

*THE PROPOSED SOLUTION OUTPUT:-*
1) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS *(column_name '')*,
id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public',
table_name 'test');
ERROR:  column generic option name cannot be empty

2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) SERVER
localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
CREATE FOREIGN TABLE

ALTER FOREIGN TABLE test_fdw ALTER COLUMN id OPTIONS *(column_name '')*;
ERROR:  column generic option name cannot be empty

--

PFA, the fix and test cases patches attached. I ran "make check world" and
do
not see any failure related to patches. But, I do see an existing failure
t/001_pgbench_with_server.pl


Regards,
Nishant.

P.S
Thanks to Jeevan Chalke and Suraj Kharage for their inputs for the proposal.

Re: Pgoutput not capturing the generated columns

2024-08-16 Thread vignesh C

On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
 wrote:
>
> On Thu, Aug 8, 2024 at 12:43 PM Peter Smith  wrote:
> >
> > Hi Shubham,
> >
> > I think the v25-0001 patch only half-fixes the problems reported in my
> > v24-0001 review.
> >
> > ~
> >
> > Background (from the commit message):
> > This commit enables support for the 'include_generated_columns' option
> > in logical replication, allowing the transmission of generated column
> > information and data alongside regular table changes.
> >
> > ~
> >
> > The broken TAP test scenario in question is replicating from a
> > "not-generated" column to a "generated" column. As the generated
> > column is not on the publishing side, IMO the
> > 'include_generated_columns' option should have zero effect here.
> >
> > In other words, I expect this TAP test for 'include_generated_columns
> > = true' case should also be failing, as I wrote already yesterday:
> >
> > +# FIXME
> > +# Since there is no generated column on the publishing side this should 
> > give
> > +# the same result as the previous test. -- e.g. something like:
> > +# ERROR:  logical replication target relation
> > "public.tab_nogen_to_gen" is missing
> > +# replicated column: "b"
>
> I have fixed the given comments. The attached v26-0001 Patch contains
> the required changes.

Few comments:
1) There's no need to pass include_generated_columns in this case; we
can retrieve it from ctx->data instead:
@@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
 static void
 send_relation_and_attrs(Relation relation, TransactionId xid,
LogicalDecodingContext *ctx,
-   Bitmapset *columns)
+   Bitmapset *columns,
bool include_generated_columns)
 {
TupleDesc   desc = RelationGetDescr(relation);
int i;
@@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
TransactionId xid,

2) Commit message:
If the subscriber-side column is also a generated column then this option
has no effect; the replicated data will be ignored and the subscriber
column will be filled as normal with the subscriber-side computed or
default data.

An error will occur in this case, so the message should be updated accordingly.

3) The current test is structured as follows: a) Create all required
tables b) Insert data into tables c) Create publications d) Create
subscriptions e) Perform inserts and verify
This approach can make reviewing and maintenance somewhat challenging.

Instead, could you modify it to: a) Create the required table for a
single test b) Insert data for this test c) Create the publication for
this test d) Create the subscriptions for this test e) Perform inserts
and verify f) Clean up

4) We can maintain the test as a separate 0002 patch, as it may need a
few rounds of review and final adjustments. Once it's fully completed,
we can merge it back in.

5) Once we create and drop publication/subscriptions for individual
tests, we won't need such extensive configuration; we should be able
to run them with default values:
+$node_publisher->append_conf(
+   'postgresql.conf',
+   "max_wal_senders = 20
+max_replication_slots = 20");

Regards,
Vignesh

Re: Parallel CREATE INDEX for BRIN indexes

2024-08-16 Thread Tomas Vondra

On 8/15/24 15:48, Peter Eisentraut wrote:
> On 13.04.24 23:04, Tomas Vondra wrote:
 While preparing a differential code coverage report between 16 and
 HEAD, one
 thing that stands out is the parallel brin build code. Neither on
 coverage.postgresql.org nor locally is that code reached during our
 tests.

>>>
>>> Thanks for pointing this out, it's definitely something that I need to
>>> improve (admittedly, should have been part of the patch). I'll also look
>>> into eliminating the difference between BTREE and BRIN parallel builds,
>>> mentioned in my last message in this thread.
>>>
>>
>> Here's a couple patches adding a test for the parallel CREATE INDEX with
>> BRIN. The actual test is 0003/0004 - I added the test to pageinspect,
>> because that allows cross-checking the index to one built without
>> parallelism, which I think is better than just doing CREATE INDEX
>> without properly testing it produces correct results.
> 
> These pageinspect tests added a new use of the md5() function.  We got
> rid of those in the tests for PG17.  You could write the test case with
> something like
> 
>  SELECT (CASE WHEN (mod(i,231) = 0) OR (i BETWEEN 3500 AND 4000) THEN
> NULL ELSE i END),
> -   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN
> NULL ELSE md5(i::text) END),
> +   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN
> NULL ELSE encode(sha256(i::text::bytea), 'hex') END),
>     (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3850 AND 4500) THEN
> NULL ELSE (i/100) + mod(i,8) END)
> 
> But this changes the test output slightly and I'm not sure if this gives
> you the data distribution that you need for you test.  Could your check
> this please?
> 

I think this is fine. The output only changes because sha256 produces
longer values than md5, so that the summaries are longer the index gets
a page longer. AFAIK that has no impact on the test.


regards

-- 
Tomas Vondra

Re: Conflict detection and logging in logical replication

2024-08-16 Thread shveta malik

On Fri, Aug 16, 2024 at 12:19 PM Amit Kapila  wrote:
>
> On Fri, Aug 16, 2024 at 11:48 AM shveta malik  wrote:
> >
> > On Fri, Aug 16, 2024 at 10:46 AM shveta malik  
> > wrote:
> > >
> > > 3)
> > > For update_exists(), we dump:
> > > Key (a, b)=(2, 1)
> > >
> > > For delete_missing, update_missing, update_differ, we dump:
> > > Replica identity (a, b)=(2, 1).
> > >
> > > For update_exists as well, shouldn't we dump 'Replica identity'? Only
> > > for insert case, it should be referred as 'Key'.
> > >
> >
> > On rethinking, is it because for update_exists case 'Key' dumped is
> > not the one used to search the row to be updated? Instead it is the
> > one used to search the conflicting row. Unlike update_differ, the row
> > to be updated and the row currently conflicting will be different for
> > update_exists case. I earlier thought that 'KEY' and 'Existing local
> > tuple' dumped always belong to the row currently being
> > updated/deleted/inserted. But for 'update_eixsts', that is not the
> > case. We are dumping 'Existing local tuple' and 'Key' for the row
> > which is conflicting and not the one being updated.  Example:
> >
> > ERROR:  conflict detected on relation "public.tab_1": conflict=update_exists
> > Key (a, b)=(2, 1); existing local tuple (2, 1); remote tuple (2, 1).
> >
> > Operations performed were:
> > Pub: insert into tab values (1,1);
> > Sub: insert into tab values (2,1);
> > Pub: update tab set a=2 where a=1;
> >
> > Here Key and local tuple are both 2,1 instead of 1,1. While replica
> > identity value (used to search original row) will be 1,1 only.
> >
> > It may be slightly confusing or say tricky to understand when compared
> > to other conflicts' LOGs. But not sure what better we can do here.
> >
>
> The update_exists behaves more like insert_exists as we detect that
> only while inserting into index. It is also not clear to me if we can
> do better than to clarify this in docs.
>

Instead of 'existing local tuple', will it be slightly better to have
'conflicting local tuple'?

Few trivial comments:

1)
errdetail_apply_conflict() header says:

 * 2. Display of conflicting key, existing local tuple, remote new tuple, and
 *replica identity columns,  if any.

We may mention that existing *conflicting* local tuple.

Looking at build_tuple_value_details(), the cases where we display
'KEY 'and the ones where we display 'replica identity' are mutually
exclusives (we have ASSERTs like that).  Shall we add this info in
header that either Key or   'replica identity' is displayed. Or if we
don't want to make it mutually exclusive then update_exists is one
such casw where we can have both Key and 'Replica Identity cols'.


2)
BuildIndexValueDescription() header comment says:

 * This is currently used
 * for building unique-constraint, exclusion-constraint and logical replication
 * tuple missing conflict error messages

Is it being used only for 'tuple missing conflict' flow? I thought, it
will be hit for other flows as well.

thanks
Shveta

Re: race condition in pg_class

2024-08-16 Thread Heikki Linnakangas


On 14/07/2024 20:48, Noah Misch wrote:

I've pushed the two patches for your reports.  To placate cfbot, I'm attaching
the remaining patches.


inplace090-LOCKTAG_TUPLE-eoxact-v8.patch: Makes sense. A comment would 
be in order, it looks pretty random as it is. Something like:


/*
 * Tuple locks are currently only held for short durations within a
 * transaction. Check that we didn't forget to release one.
 */

inplace110-successors-v8.patch: Makes sense.

The README changes would be better as part of the third patch, as this 
patch doesn't actually do any of the new locking described in the 
README, and it fixes the "inplace update updates wrong tuple" bug even 
without those tuple locks.



+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ * elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ * heap_inplace_update_finish(inplace_state, tup);
+ * else
+ * heap_inplace_update_cancel(inplace_state);


I wonder if the functions should be called "systable_*" and placed in 
genam.c rather than in heapam.c. The interface looks more like the 
existing systable functions. It feels like a modularity violation for a 
function in heapam.c to take an argument like "indexId", and call back 
into systable_* functions.



/*--
 * XXX A crash here can allow datfrozenxid() to get ahead of 
relfrozenxid:
 *
 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
 * ["R" is a VACUUM tbl]
 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
* c * D: systable_getnext() returns pg_class tuple of tbl
 * R: memcpy() into pg_class tuple of tbl
 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
 * [crash]
 * [recovery restores datfrozenxid w/o relfrozenxid]
 */


Hmm, that's a tight race, but feels bad to leave it unfixed. One 
approach would be to modify the tuple on the buffer only after 
WAL-logging it. That way, D cannot read the updated value before it has 
been WAL logged. Just need to make sure that the change still gets 
included in the WAL record. Maybe something like:


if (RelationNeedsWAL(relation))
{
/*
 * Make a temporary copy of the page that includes the change, in
 * case the a full-page image is logged
 */
PGAlignedBlock tmppage;

memcpy(tmppage.data, page, BLCKSZ);

/* copy the tuple to the temporary copy */
memcpy(...);

XLogRegisterBlock(0, ..., tmppage, REGBUF_STANDARD);
XLogInsert();
}

/* copy the tuple to the buffer */
memcpy(...);



  pg_class heap_inplace_update_scan() callers: before the call, acquire
  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
  on the table.  Locking the index rel is optional.  (This allows VACUUM to
  overwrite per-index pg_class while holding a lock on the table alone.)  We
  could allow weaker locks, in which case the next paragraph would simply call
  for stronger locks for its class of commands.  heap_inplace_update_scan()
  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
  ExclusiveLock, on each tuple it overwrites.

  pg_class heap_update() callers: before copying the tuple to modify, take a
  lock that conflicts with at least one of those from the preceding paragraph.
  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
  to acquire just the LOCKTAG_RELATION.


These rules seem complicated. Phrasing this slightly differently, if I 
understand correctly: for a heap_update() caller, it's always sufficient 
to hold LOCKTAG_TUPLE, but if you happen to hold some other lock on the 
relation that conflicts with those mentioned in the first paragraph, 
then you can skip the LOCKTAG_TUPLE lock.


Could we just stipulate that you must always hold LOCKTAG_TUPLE when you 
call heap_update() on pg_class or pg_database? That'd make the rule simple.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-16 Thread Peter Eisentraut


On 14.08.24 13:01, Aleksander Alekseev wrote:

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

Examples:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
  ?column?
--
  t

SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
  set_bytes

  \x11aabb4455667788
```


I think these functions do about three things at once, and I don't think 
they address the originally requested purpose very well.


Converting between integers and byte arrays of matching size seems like 
reasonable functionality.  (You can already do one half of that by 
calling int2send(), int4send(), and int8send(), but the other direction 
(intXrecv()) is not user-callable).


The other things are extracting that byte array from a larger byte array 
and sticking it back into a larger byte array; those seem like separate 
operations.  There is already substr() for bytea for the first part, and 
there might be another string-like operationg for the second part, or 
maybe we could add one.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-08-16 Thread Michail Nikolaev

Hello, everyone.

I have updated the spec to reproduce the issue, now it includes cases with
both CREATE INDEX and REINDEX.

To run:
  make -C src/test/modules/injection_points/ check

Issue reproduced on empty index, but it may happen on index of any with the
same probability.
It is not critical, of course, but in production system indexes are
regularly rebuilt using REINDEX CONCURRENTLY as recommended in
documentation [1].
In most of the cases it is done using pg_repack as far as I know.

So, in these production systems, there is no guarantee what INSERT ON
CONFLICT DO NOTHING/UPDATE will not fail with a "duplicate key value
violates unique constraint" error.

Best regards,
Mikhail.

[1]: https://www.postgresql.org/docs/current/routine-reindex.html

>
From e41389c4a873fbf7dd28907cb4624dffc56cb3d9 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Fri, 16 Aug 2024 11:19:37 +0200
Subject: [PATCH v2] specs to reproduce issues with CREATE INDEX/REINDEX
 CONCURRENTLY for UNIQUE indexes and INSERT ON CONFLICT UPDATE using injection
 points

---
 src/backend/commands/indexcmds.c  |   5 +-
 src/backend/executor/execIndexing.c   |   3 +
 src/backend/executor/nodeModifyTable.c|   2 +
 src/backend/utils/time/snapmgr.c  |   2 +
 src/test/modules/injection_points/Makefile|   2 +-
 .../expected/index_concurrently_upsert.out|  80 ++
 .../expected/reindex_concurrently_upsert.out  | 238 ++
 src/test/modules/injection_points/meson.build |   2 +
 .../specs/index_concurrently_upsert.spec  |  68 +
 .../specs/reindex_concurrently_upsert.spec|  86 +++
 10 files changed, 486 insertions(+), 2 deletions(-)
 create mode 100644 src/test/modules/injection_points/expected/index_concurrently_upsert.out
 create mode 100644 src/test/modules/injection_points/expected/reindex_concurrently_upsert.out
 create mode 100644 src/test/modules/injection_points/specs/index_concurrently_upsert.spec
 create mode 100644 src/test/modules/injection_points/specs/reindex_concurrently_upsert.spec

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88aa5..822467bdd5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -69,6 +69,7 @@
 #include "utils/regproc.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/injection_point.h"
 
 
 /* non-export function prototypes */
@@ -1775,6 +1776,7 @@ DefineIndex(Oid tableId,
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
  PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForOlderSnapshots(limitXmin, true);
+	INJECTION_POINT("define_index_before_set_valid");
 
 	/*
 	 * Index can now be marked valid -- update its pg_index entry
@@ -4078,7 +4080,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * the same time to make sure we only get constraint violations from the
 	 * indexes with the correct names.
 	 */
-
+	INJECTION_POINT("reindex_relation_concurrently_before_swap");
 	StartTransactionCommand();
 
 	/*
@@ -4152,6 +4154,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
  PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
+	INJECTION_POINT("reindex_relation_concurrently_before_set_dead");
 
 	foreach(lc, indexIds)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9f05b3654c..1d451a329a 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -115,6 +115,7 @@
 #include "nodes/nodeFuncs.h"
 #include "storage/lmgr.h"
 #include "utils/snapmgr.h"
+#include "utils/injection_point.h"
 
 /* waitMode argument to check_exclusion_or_unique_constraint() */
 typedef enum
@@ -901,6 +902,8 @@ retry:
 	econtext->ecxt_scantuple = save_scantuple;
 
 	ExecDropSingleTupleTableSlot(existing_slot);
+	if (!conflict)
+		INJECTION_POINT("check_exclusion_or_unique_constraint_no_conflict");
 
 	return !conflict;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4913e49319..65bc63c612 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -69,6 +69,7 @@
 #include "utils/datum.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
+#include "utils/injection_point.h"
 
 
 typedef struct MTTargetRelLookup
@@ -1084,6 +1085,7 @@ ExecInsert(ModifyTableContext *context,
 	return NULL;
 }
 			}
+			INJECTION_POINT("exec_insert_before_insert_speculative");
 
 			/*
 			 * Before we start insertion proper, acquire our "speculative
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7d2b34d4f2..3a7357a050 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -64,6 +64,7 @@
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
 #i

Re: Useless parameter 'cur_skey' in IndexScanOK

2024-08-16 Thread Heikki Linnakangas


On 03/07/2024 16:46, Daniel Gustafsson wrote:

On 3 Jul 2024, at 15:41, Aleksander Alekseev  wrote:

The 'cur_skey' parameter in `IndexScanOK` funciton doesn't seem to be useful.



Good catch. As I understand it is not used for anything since
a78fcfb51243 (dated 2006) and this is a static function, so we
shouldn't worry about third-party extensions.


Agreed, it seems reasonable to clean this up.


I wonder why none of the compilers complained about this.


Not to mention static analyzers.


Committed, thanks.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-16 Thread Yugo Nagata

On Fri, 16 Aug 2024 11:41:37 +0300
Aleksander Alekseev  wrote:

> Hi,
> 
> > When we add such casts between bytea and the integer/numeric types,
> > one of the problems mentioned the first of the thread, that is,
> > "we don't have a convenient way of casting a bytea to an integer / bigint
> > and vice versa", would seem be resolved.
> >
> > On the other hand, I suppose get_bytes() and set_bytes() are still useful
> > for extracting bytes from byteas, etc. If casting is no longer the main
> > purpose of these functions, are variations that get_bytes returns bytea
> > instead of bigint, and set_bytes receives bytea as the newvalue argument
> > useful? I wonder it would eliminate the restrict that size cannot be larger
> > than 8.
> 
> No, casting between bytea and numeric will not replace get_bytes() /
> set_bytes() for performance reasons.
> 
> Consider the case when you want to extract an int4 from a bytea.
> get_bytes() is going to be very fast while substr() -> ::numeric ->
> ::integer chain will require unnecessary copying and conversions.
> Casting between bytea and numeric is only useful when one has to deal
> with integers larger than 8 bytes. Whether this happens often is a
> debatable question.

Thank you for explanation. I understood the performance drawback.

I supposed interfaces similar to lo_get, lo_put, loread, lowrite of
large objects since they might be useful to access or modify a part of
bytea like a binary file read by pg_read_binary_file. 

> 
> > Here are my very trivial comments on the patch.
> >
> > + * this routine treats "bytea" as an array of bytes.
> >
> > Maybe, the sentence should start with "This ... ".
> >
> > +   while(size)
> > +   {
> >
> > I wonder inserting a space after "while" is the standard style.
> 
> Thanks, fixed.

Should we fix the comment on byteaGetByte in passing, too?

Regards,
Yugo Nagata

-- 
Yugo Nagata

Re: refactor the CopyOneRowTo

2024-08-16 Thread Heikki Linnakangas


On 31/07/2024 16:30, Junwang Zhao wrote:

On Fri, Jul 5, 2024 at 12:26 AM jian he  wrote:

overall less "if else" logic,
also copy format don't change during copy, we don't need to check
binary or nor for each datum value.


Committed, thanks.

For the archives: this code is in a very hot path during COPY TO, so I 
did some quick micro-benchmarking on my laptop. I used this:


COPY (select 
0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10 
from generate_series(1, 1_000_000) g) TO '/dev/null';


and

COPY (select 0 from generate_series(1, 1_000_000) g) TO '/dev/null';

to check the impact with few attributes and with many attributes. I 
repeated that a few times with \timing with and without the patch, and 
eyeballed the result. I also used 'perf' to check the fraction of CPU 
time spent in CopyOneRowTo. Conclusion: the patch has no performance impact.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Vacuum statistics

2024-08-16 Thread jian he

On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina
 wrote:
>
> Hi!

I've applied all the v5 patches.
0002 and 0003 have white space errors.

+  
+Number of times blocks of this index were already found
+in the buffer cache by vacuum operations, so that a read was
not necessary
+(this only includes hits in the
+&project; buffer cache, not the operating system's file system cache)
+  

+Number of times blocks of this table were already found
+in the buffer cache by vacuum operations, so that a read was
not necessary
+(this only includes hits in the
+&project; buffer cache, not the operating system's file system cache)
+  

"&project;"
represents a sgml file placeholder name as "project" and puts all the
content of "project.sgml" to system-views.sgml.
but you don't have "project.sgml". you may check
doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml
for usage of "&place_holder;".
so you can change it to "project", otherwise doc cannot build.

src/backend/commands/dbcommands.c
we have:
/*
 * If built with appropriate switch, whine when regression-testing
 * conventions for database names are violated.  But don't complain during
 * initdb.
 */
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)
elog(WARNING, "databases created by regression test cases
should have names including \"regression\"");
#endif
so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you
need to change dbname:
CREATE DATABASE statistic_vacuum_database;
CREATE DATABASE statistic_vacuum_database1;

+  
+   The view pg_stat_vacuum_indexes will contain
+   one row for each index in the current database (including TOAST
+   table indexes), showing statistics about vacuuming that specific index.
+  
TOAST should
TOAST

+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
maybe change to
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
 errmsg("return type must be a row type")));
Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all
the work. much of the code can be gotten rid of.
please check attached.

#define EXTVACHEAPSTAT_COLUMNS27
#define EXTVACIDXSTAT_COLUMNS19
#define EXTVACDBSTAT_COLUMNS15
#define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS)

static Oid CurrentDatabaseId = InvalidOid;

we already defined MyDatabaseId in src/include/miscadmin.h,
Why do we need "static Oid CurrentDatabaseId = InvalidOid;"?
also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h".

the following code one function has 2 return statements?

/*
 * Get the vacuum statistics for the heap tables.
 */
Datum
pg_stat_vacuum_tables(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS);

PG_RETURN_NULL();
}

/*
 * Get the vacuum statistics for the indexes.
 */
Datum
pg_stat_vacuum_indexes(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS);

PG_RETURN_NULL();
}

/*
 * Get the vacuum statistics for the database.
 */
Datum
pg_stat_vacuum_database(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS);

PG_RETURN_NULL();
}

in pg_stats_vacuum:
if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
{
Oidrelid = PG_GETARG_OID(1);

/* Load table statistics for specified database. */
if (OidIsValid(relid))
{
tabentry = fetch_dbstat_tabentry(dbid, relid);
if (tabentry == NULL || tabentry->vacuum_ext.type != type)
/* Table don't exists or isn't an heap relation. */
PG_RETURN_NULL();

tuplestore_put_for_relation(relid, tupstore, tupdesc,
tabentry, ncolumns);
}
else
{
 ...
}
}
I don't understand the ELSE branch. the IF branch means the input
dboid, heap/index oid is correct.
the ELSE branch means table reloid is invalid = 0.
I am not sure 100% what the ELSE Branch means.
Also there are no comments explaining why.
experiments seem to show that  when reloid is 0, it will print out all
the vacuum statistics
for all the tables in the current database. If so, then only super
users can call pg_stats_vacuum?
but the table owner should be able to call pg_stats_vacuum for that
specific table.

/* Type of ExtVacReport */
typedef enum ExtVacReportType
{
PGSTAT_EXTVAC_INVALID = 0,
PGSTAT_EXTVAC_HEAP = 1,
PGSTAT_EXTVAC_INDEX = 2,
PGSTAT_EXTVAC_DB = 3,
} ExtVacReportType;
gener

Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2024-08-16 Thread Tomas Vondra

Hi Ayush,

On 8/13/24 07:37, Ayush Tiwari wrote:
> Hi hackers,
> 
> We encountered an issue lately, that if the database grants too many
> roles `datacl` is toasted, following which, the drop database command
> will fail with error "wrong tuple length".
> 
> To reproduce the issue, please follow below steps:
> 
> CREATE DATABASE test;
> 
> -- create helper function
> CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
> declare
>           mycounter int;
> begin
>           for mycounter in select i from generate_series(1,2000) i loop
>                     execute 'CREATE
> ROLE aaabb ' || mycounter;
>                     execute 'GRANT ALL ON DATABASE test to
> aaabb ' || mycounter;
>           end loop;
>           return 'ok';
> end;
> $body$ language plpgsql volatile strict; 
> 
> -- create roles and grant on the database.
> SELECT data_tuple(); 
> 
> -- drop database command, this will result in "wrong tuple length" error.
> DROP DATABASE test;
> 
> The root cause of this behaviour is that the HeapTuple in dropdb
> function fetches a copy of pg_database tuple from system cache.
> But the system cache flattens any toast attributes, which cause the
> length check to fail in heap_inplace_update.
> 
> A patch for this issue is attached to the mail, the solution is to
> change the logic to fetch the tuple by directly scanning pg_database
> rather than using the catcache.
> 

Thanks for the report. I can reproduce the issue following your
instructions, and the fix seems reasonable ...


But there's also one thing I don't quite understand. I did look for
other places that might have a similar issue, that is places that

  1) lookup tuple using SearchSysCacheCopy1

  2) call on the tuple heap_inplace_update


And I found about four places doing that:

  - index_update_stats (src/backend/catalog/index.c)

  - create_toast_table (src/backend/catalog/toasting.c)

  - vac_update_relstats / vac_update_datfrozenxid (commands/vacuum.c)


But I haven't managed to trigger the same kind of failure for any of
those places, despite trying. AFAIK that's because those places update
pg_class, and that doesn't have TOAST, so the tuple length can't change.


So this fix seems reasonable.

-- 
Tomas Vondra

Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2024-08-16 Thread Nishant Sharma

Oops...
I forgot to attach the patch. Thanks to Amul Sul for pointing that out. :)



On Fri, Aug 16, 2024 at 2:37 PM Nishant Sharma <
nishant.sha...@enterprisedb.com> wrote:

> Hi,
>
>
>
> --
> Actual column names used while creation of foreign table are not allowed
> to be an
> empty string, but when we use column_name as an empty string in OPTIONS
> during
> CREATE or ALTER of foreign tables, it is allowed.
>
> *EXAMPLES:-*
> 1) CREATE FOREIGN TABLE test_fdw(*"" *VARCHAR(15), id VARCHAR(5)) SERVER
> localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
> ERROR:  zero-length delimited identifier at or near 
> LINE 1: CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5))...
>
> 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) *OPTIONS* *(column_name
> '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public',
> table_name 'test');
> CREATE FOREIGN TABLE
>
> postgres@43832=#\d test_fdw
>   Foreign table "public.test_fdw"
>  Column | Type  | Collation | Nullable | Default |   FDW
> options
>
> +---+---+--+-+--
>  name   | character varying(15) |   |  | | 
> *(column_name
> '')*
>  id | character varying(5)  |   |  | |
> Server: localhost_fdw
> FDW options: (schema_name 'public', table_name 'test')
>
>
> --
>
> Due to the above, when we try to simply select a remote table, the remote
> query uses
> the empty column name from the FDW column option and the select fails.
>
> *EXAMPLES:-*
> 1) select * from test_fdw;
> ERROR:  zero-length delimited identifier at or near 
> CONTEXT:  remote SQL command: SELECT "", id FROM public.test
>
> 2) explain verbose select * from test_fdw;
> QUERY PLAN
> --
>  Foreign Scan on public.test_fdw  (cost=100.00..297.66 rows=853 width=72)
>Output: name, id
>Remote SQL: SELECT "", id FROM public.test
> (3 rows)
>
>
> --
>
> We can fix this issue either during fetching of FDW column option names
> while
> building remote query or we do not allow at CREATE or ALTER of foreign
> tables itself.
> We think it would be better to disallow adding the column_name option as
> empty in
> CREATE or ALTER itself as we do not allow empty actual column names for a
> foreign
> table. Unless I missed to understand the purpose of allowing column_name
> as empty.
>
> *THE PROPOSED SOLUTION OUTPUT:-*
> 1) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS *(column_name
> '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public',
> table_name 'test');
> ERROR:  column generic option name cannot be empty
>
> 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) SERVER
> localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
> CREATE FOREIGN TABLE
>
> ALTER FOREIGN TABLE test_fdw ALTER COLUMN id OPTIONS *(column_name '')*;
> ERROR:  column generic option name cannot be empty
>
>
> --
>
> PFA, the fix and test cases patches attached. I ran "make check world" and
> do
> not see any failure related to patches. But, I do see an existing failure
> t/001_pgbench_with_server.pl
>
>
> Regards,
> Nishant.
>
> P.S
> Thanks to Jeevan Chalke and Suraj Kharage for their inputs for the
> proposal.
>


v1-0001-Disallow-empty-Foreign-Table-column-option-name-i.patch
Description: Binary data


v1-0002-Test-Cases-Changes.patch
Description: Binary data

Re: Conflict detection and logging in logical replication

2024-08-16 Thread Michail Nikolaev

Hello!

> I think you might misunderstand the behavior of CheckAndReportConflict(),
even
> if it found a conflict, it still inserts the tuple into the index which
means
> the change is anyway applied.

> In the above conditions where a concurrent tuple insertion is removed
> or rolled back before CheckAndReportConflict, the tuple inserted by
> apply will remain. There is no need to report anything in such cases
> as apply was successful.

Yes, thank you for explanation, I was thinking UNIQUE_CHECK_PARTIAL works
differently.

But now I think DirtySnapshot-related bug is a blocker for this feature
then, I'll reply into original after rechecking it.

Best regards,
Mikhail.

Ineffective Assert-check in CopyMultiInsertInfoNextFreeSlot()

2024-08-16 Thread Amul Sul

Hi,

The Assert(buffer != NULL) is placed after the buffer is accessed,
which could lead to a segmentation fault before the check is executed.

Attached a small patch to correct that.

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ca85270be6d..2d3462913e1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -597,10 +597,12 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
+	int			nused;
 
 	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	Assert(buffer->nused < MAX_BUFFERED_TUPLES);
+
+	nused = buffer->nused;
 
 	if (buffer->slots[nused] == NULL)
 		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);

Re: Relation bulk write facility

2024-08-16 Thread Heikki Linnakangas


On 03/07/2024 06:41, Noah Misch wrote:

On Tue, Jul 02, 2024 at 02:42:50PM +0300, Heikki Linnakangas wrote:

On 02/07/2024 02:24, Noah Misch wrote:

On Tue, Jul 02, 2024 at 12:53:05AM +0300, Heikki Linnakangas wrote:

Fortunately, fsync() on a file that's already flushed to disk is pretty
cheap.


Yep.  I'm more concerned about future readers wondering why the function is
using LSNs to decide what to do about data that doesn't appear in WAL.  A
comment could be another way to fix that, though.


Agreed, this is all very subtle, and deserves a good comment. What do you
think of the attached?


Looks good.  Thanks.  pgindent doesn't preserve all your indentation, but it
doesn't make things objectionable, either.


Committed, thanks!

(Sorry for the delay, I had forgotten about this already and found it 
only now sedimented at the bottom of my inbox)


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Make query cancellation keys longer

2024-08-16 Thread Robert Haas

On Thu, Aug 15, 2024 at 6:07 PM Heikki Linnakangas  wrote:
> Ok, I've read through that thread now, and opined there too. One
> difference is with libpq option name: My patch adds "protocol_version",
> while Jelte proposes "max_protocol_version". I don't have strong
> opinions on that. I hope the ecosystem catches up to support
> NegotiateProtocolVersion quickly, so that only few people will need to
> set this option. In particular, I hope that there will never be need to
> use "max_protocol_version=3.2", because by the time we introduce version
> 3.3, all the connection poolers that support 3.2 will also implement
> NegotiateProtocolVersion.

In Jelte's design, there end up being two connection parameters. We
tell the server we want max_protocol_version, but we accept that it
might give us something older. If, however, it tries to degrade us to
something lower than min_protocol_version, we bail out. I see you've
gone for a simpler design: you ask the server for protocol_version and
you get that or you die. To be honest, right up until exactly now, I
was assuming we wanted a two-parameter system like that, just because
being able to tolerate a range of protocol versions seems useful.
However, maybe we don't need it. Alternatively, we could do this for
now, and then later we could adjust the parameter so that you can say
protocol_version=3.2-3.7 and the client will ask for 3.7 but tolerate
anything >= 3.2. Hmm, I kind of like that idea.

I think it's likely that the ecosystem will catch up with
NegotiateProtocolVersion once things start breaking. However, I feel
pretty confident that there are going to be glitches. Clients are
going to want to force newer protocol versions to make sure they get
new features, or to make sure that security features that they want to
have (like this one) are enabled. Some users are going to be running
old poolers that can't handle 3.2, or there will be weirder things
where the pooler says it supports it but it doesn't actually work
properly in all cases. There are also non-PG servers that reimplement
the PG wire protocol. I can't really enumerate all the things that go
wrong, but I think there are a number of wire protocol changes that
various people have been wanting for a long while now, and when we
start to get the infrastructure in place to make that practical,
people are going to take advantage of it. So I think we can expect a
number of protocol enhancements and changes -- Peter's transparent
column encryption stuff is another example -- and there will be
mistakes by us and mistakes by others along the way. Allowing users to
specify what protocol version they want is probably an important part
of coping with that.

The documentation in the patch you attached still seems to think
there's an explicit length field for the cancel key. Also, I think it
would be good to split this into two patches, one to bump the protocol
version and a second to change the cancel key stuff. It would
facilitate review, and I also think that bumping the protocol version
is a big enough deal that it should have its own entry in the commit
log.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2024-08-16 Thread Amit Langote

On Fri, Aug 16, 2024 at 12:35 AM Robert Haas  wrote:
> On Thu, Aug 15, 2024 at 8:57 AM Amit Langote  wrote:
> > TBH, it's more of a hunch that people who are not involved in this
> > development might find the new reality, whereby the execution is not
> > racefree until ExecutorRun(), hard to reason about.
>
> I'm confused by what you mean here by "racefree". A race means
> multiple sessions are doing stuff at the same time and the result
> depends on who does what first, but the executor stuff is all
> backend-private. Heavyweight locks are not backend-private, but those
> would be taken in ExectorStart(), not ExecutorRun(), IIUC.

Sorry, yes, I meant ExecutorStart().  A backend that wants to execute
a plan tree from a CachedPlan is in a race with other backends that
might modify tables before ExecutorStart() takes the remaining locks.
That race window is bigger when it is ExecutorStart() that will take
the locks, and I don't mean in terms of timing, but in terms of the
other code that can run in between GetCachedPlan() returning a
partially valid plan and ExecutorStart() takes the remaining locks
depending on the calling module.

> > With the patch, CreateQueryDesc() and ExecutorStart() are moved to
> > PortalStart() so that QueryDescs including the PlanState trees for all
> > queries are built before any is run.  Why?  So that if ExecutorStart()
> > fails for any query in the list, we can simply throw out the QueryDesc
> > and the PlanState trees of the previous queries (NOT run them) and ask
> > plancache for a new CachedPlan for the list of queries.  We don't have
> > a way to ask plancache.c to replan only a given query in the list.
>
> I agree that moving this from PortalRun() to PortalStart() seems like
> a bad idea, especially in view of what you write below.
>
> > * There's no longer CCI() between queries in PortalRunMulti() because
> > the snapshots in each query's QueryDesc must have been adjusted to
> > reflect the correct command counter.  I've checked but can't really be
> > sure if the value in the snapshot is all anyone ever uses if they want
> > to know the current value of the command counter.
>
> I don't think anything stops somebody wanting to look at the current
> value of the command counter. I also don't think you can remove the
> CommandCounterIncrement() calls between successive queries, because
> then they won't see the effects of earlier calls. So this sounds
> broken to me.

I suppose you mean CCI between "running" (calling ExecutorRun on)
successive queries.  Then the patch is indeed broken.  If we're to
make that right, the number of CCIs for the multi-query portals will
have to double given the separation of ExecutorStart() and
ExecutorRun() phases.

> Also keep in mind that one of the queries could call a function which
> does something that bumps the command counter again. I'm not sure if
> that creates its own hazzard separate from the lack of CCIs, or
> whether it's just another part of that same issue. But you can't
> assume that each query's snapshot should have a command counter value
> one more than the previous query.
>
> While this all seems bad for the partially-initialized-execution-tree
> approach, I wonder if you don't have problems here with the other
> design, too. Let's say you've the multi-query case and there are 2
> queries. The first one (Q1) is SELECT mysterious_function() and the
> second one (Q2) is SELECT * FROM range_partitioned_table WHERE
> key_column = 42. What if mysterious_function() performs DDL on
> range_partitioned_table? I haven't tested this so maybe there are
> things going on here that prevent trouble, but it seems like executing
> Q1 can easily invalidate the plan for Q2. And then it seems like
> you're basically back to the same problem.

A rule (but not views AFAICS) can lead to the multi-query case (there
might be other ways).  I tried the following, and, yes, the plan for
the query queued by the rule is broken by the execution of that for
the 1st query:

create table foo (a int);
create table bar (a int);
create or replace function foo_trig_func () returns trigger as $$
begin drop table bar cascade; return new.*; end; $$ language plpgsql;
create trigger foo_trig before insert on foo execute function foo_trig_func();
create rule insert_foo AS ON insert TO foo do also insert into bar
values (new.*);
set plan_cache_mode to force_generic_plan ;
prepare q as insert into foo values (1);
execute q;
NOTICE:  drop cascades to rule insert_foo on table foo
ERROR:  relation with OID 16418 does not exist

The ERROR comes from trying to run (actually "initialize") the cached
plan for `insert into bar values (new.*);` which is due to the rule.

Though, it doesn't have to be a cached plan for the breakage to
happen.  You can see the same error without the prepared statement:

insert into foo values (1);
NOTICE:  drop cascades to rule insert_foo on table foo
ERROR:  relation with OID 16418 does not exist

Another example:

create or replace functio

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Robert Haas

On Thu, Aug 15, 2024 at 6:03 PM Heikki Linnakangas  wrote:
> On the default for "max_protocol_version": I'm pretty disappointed if we
> cannot change the default to "latest". I realize that that won't work
> with poolers that don't implement NegotiateProtocolVersion. But I'm
> afraid if we make the new protocol version opt-in, no one will use it,
> and the poolers etc still won't bother to implement
> NegotiateProtocolVersion for years to come, if ever. We can revisit this
> decision later in the release cycle though. But I'd suggest changing the
> default to "latest" for now, so that more hackers working with
> connection poolers will notice, and we get more testing of the new
> protocol and the negotiation.

In this regard, I think your proposed protocol change (bumping the
cancel-key length) is different from all of the other protocol
enhancement proposals that I can think of. Most people seem to be
interested in adding an optional feature that some clients might want
and other clients might not care about. Peter Eisentraut's transparent
column encryption stuff is an example of that. What Jelte wants to do
here is too, really, because while these facilities seem like they
could be generally useful for poolers -- at least if we could agree on
what to do and work out all the problems -- and could potentially be
used by applications as well, there would no requirement that any
particular application use any of the new facilities and many of them
wouldn't. So in that kind of world, it makes more sense to me to
default to 3.0 unless the user indicates a desire to use a newer
feature. That way, we minimize breakage at very little cost. Desire to
use the new features can be expected to spur some development in
ecosystem projects, and until that work gets done, many setups are
unaffected.

But the cancel key is a whole different kind of thing. I don't expect
people to be motivated to add support for newer protocol versions just
to get a longer cancel key. If we want people to adopt longer cancel
keys, we need to change the client default and accept that there's
going to be a bunch of breakage until everybody fixes their code.

But is that actually a good idea?

I have some misgivings about that. When somebody's stuff stops working
because of some problem that boils down to "we made the cancel key
longer," I think we're going to have some pretty irate users. I
believe everybody would agree in a vacuum that making cancel keys
longer is probably a good idea, but it seems like a relatively minor
benefit for the amount of inconvenience we're going to be imposing on
everyone. On the other hand, the logical conclusion of that argument
is that we should never do it, which I don't really believe either.
I'm actually kind of surprised that nobody else (that I've seen,
anyway) has expressed concern about the fact that that proposal
involves a protocol version bump. Have people not noticed? Does nobody
care? To me, that thread reads like "I'm going to make a fire in the
fireplace" but then there's a footnote that reads "by setting off a
nuclear bomb" but we're only talking about how warm and cozy the fire
will be. :-)

I'm sure you're going to say "it's worth it" -- you wouldn't have
written the patch otherwise -- but I wonder if it's going to feel
worth it to everybody who has to deal with the downstream effects.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: thread-safety: gmtime_r(), localtime_r()

2024-08-16 Thread Thomas Munro

On Tue, Jul 23, 2024 at 10:52 PM Peter Eisentraut  wrote:
> Let's look at what this code actually does.  It just takes the current
> time and then loops through all possible weekdays and months to collect
> and cache the localized names.  The current time or time zone doesn't
> actually matter for this, we just need to fill in the struct tm a bit
> for strftime() to be happy.  We could probably replace this with
> gmtime() to avoid the question about time zone state.  (We probably
> don't even need to call time() beforehand, we could just use time zero.
> But the comment says "We use times close to current time as data for
> strftime().", which is probably prudent.)  (Since we are caching the
> results for the session, we're already not dealing with the hilarious
> hypothetical situation where the weekday and month names depend on the
> actual time, in case that is a concern.)

I think you could even just use a struct tm filled in with an example
date?  Hmm, but it's annoying to choose one, and annoying that POSIX
says there may be other members of the struct, so yeah, I think
gmtime_r(0, tm) makes sense.  It can't be that important, because we
aren't even using dates consistent with tm_wday, so we're assuming
that strftime("%a") only looks at tm_wday.

This change complements CF #5170's change strftime()->strftime_l(), to
make the function fully thread-safe.

Someone could also rewrite it to call
nl_langinfo_l({ABDAY,ABMON,DAY,MON}_1 + n, locale) directly, or
GetLocaleInfoEx(locale_name,
LOCALE_S{ABBREVDAY,ABBREVMONTH,DAY,MONTH}NAME1 + n, ...) on Window,
but that'd be more code churn.

Re: thread-safety: gmtime_r(), localtime_r()

2024-08-16 Thread Thomas Munro

On Sat, Aug 17, 2024 at 1:09 AM Thomas Munro  wrote:
> This change complements CF #5170's change strftime()->strftime_l(), to
> make the function fully thread-safe.

(Erm, I meant its standard library... of course it has its own global
variables to worry about still.)

Re: [BUG?] check_exclusion_or_unique_constraint false negative

2024-08-16 Thread Michail Nikolaev

Hello!

> In addition, I think the bug is not a blocker for the conflict detection
> feature. As the feature simply reports the current behavior of the logical
> apply worker (either unique violation or tuple missing) without
introducing any
> new functionality. Furthermore, I think that the new
ExecCheckIndexConstraints
> call after ExecInsertIndexTuples() is not affected by the dirty snapshot
bug.
> This is because a tuple has already been inserted into the btree before
the
> dirty snapshot scan, which means that a concurrent non-HOT update would
not be
> possible (it would be blocked after finding the just inserted tuple and
wait
> for the apply worker to commit the current transaction).

> It would be good if others could also share their opinion on this.

Yes, you are right. At least, I can't find any scenario for that case.

Best regards,
Mikhail.

Re: Make query cancellation keys longer

2024-08-16 Thread Heikki Linnakangas


On 16/08/2024 15:31, Robert Haas wrote:

On Thu, Aug 15, 2024 at 6:07 PM Heikki Linnakangas  wrote:

Ok, I've read through that thread now, and opined there too. One
difference is with libpq option name: My patch adds "protocol_version",
while Jelte proposes "max_protocol_version". I don't have strong
opinions on that. I hope the ecosystem catches up to support
NegotiateProtocolVersion quickly, so that only few people will need to
set this option. In particular, I hope that there will never be need to
use "max_protocol_version=3.2", because by the time we introduce version
3.3, all the connection poolers that support 3.2 will also implement
NegotiateProtocolVersion.


In Jelte's design, there end up being two connection parameters. We
tell the server we want max_protocol_version, but we accept that it
might give us something older. If, however, it tries to degrade us to
something lower than min_protocol_version, we bail out. I see you've
gone for a simpler design: you ask the server for protocol_version and
you get that or you die. To be honest, right up until exactly now, I
was assuming we wanted a two-parameter system like that, just because
being able to tolerate a range of protocol versions seems useful.
However, maybe we don't need it. Alternatively, we could do this for
now, and then later we could adjust the parameter so that you can say
protocol_version=3.2-3.7 and the client will ask for 3.7 but tolerate
anything >= 3.2. Hmm, I kind of like that idea.


Works for me.

If we envision accepting ranges like that in the future, it would be 
good to do now rather than later. Otherwise, if someone wants to require 
features from protocol 3.2 today, they will have to put 
"protocol_version=3.2" in the connection string, and later when 3.3 
version is released, their connection string will continue to force the 
then-old 3.2 version.



I think it's likely that the ecosystem will catch up with
NegotiateProtocolVersion once things start breaking. However, I feel
pretty confident that there are going to be glitches. Clients are
going to want to force newer protocol versions to make sure they get
new features, or to make sure that security features that they want to
have (like this one) are enabled. Some users are going to be running
old poolers that can't handle 3.2, or there will be weirder things
where the pooler says it supports it but it doesn't actually work
properly in all cases. There are also non-PG servers that reimplement
the PG wire protocol. I can't really enumerate all the things that go
wrong, but I think there are a number of wire protocol changes that
various people have been wanting for a long while now, and when we
start to get the infrastructure in place to make that practical,
people are going to take advantage of it. So I think we can expect a
number of protocol enhancements and changes -- Peter's transparent
column encryption stuff is another example -- and there will be
mistakes by us and mistakes by others along the way. Allowing users to
specify what protocol version they want is probably an important part
of coping with that.


Yes, it's a good escape hatch to have.


The documentation in the patch you attached still seems to think
there's an explicit length field for the cancel key.


ok thanks


Also, I think it
would be good to split this into two patches, one to bump the protocol
version and a second to change the cancel key stuff. It would
facilitate review, and I also think that bumping the protocol version
is a big enough deal that it should have its own entry in the commit
log.


Right. That's what Jelte's first patches did too. Those changes are more 
or less the same between this patch and his. These clearly need to be 
merged into one "introduce protocol version 3.2" patch.


I'll split this patch like that, to make it easier to compare and merge 
with Jelte's corresponding patches.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread Tom Lane

Peter Eisentraut  writes:
> On 15.08.24 19:25, Matthias van de Meent wrote:
>> Apart from the above issue, I'm -0.5 on what to me equates with 
>> automated spam to -hackers: the volume of mails would put this around 
>> the 16th most common sender on -hackers, with about 400 mails/year 
>> (based on 80 new patches for next CF, and 5 CFs/year, combined with 
>> Robert's 2023 statistics at [0]).

> Yeah, I'd rather not open the can of worms that we send automated emails 
> to this list at all.  If we do this, then there will be other requests, 
> and why this one and not that one.  If people want to get emails from 
> the commitfest app, it should be that you subscribe there and it sends 
> those emails to those who want them.

That would destroy the one good argument for this, which was to
provide an easy way to get from a mail list thread (in the archives)
to the corresponding CF entry or entries.  You pull up the "flat"
thread, search for the bot mail, and click the link.  Without that
I see little point at all.

However, there are other ways to accomplish that.  I liked the
suggestion of extending the CF webapp with a way to search for entries
mentioning a particular mail message ID.  I dunno how hard it'd be to
get it to recognize *any* message-ID from a thread, but surely at
least the head message ought not be too hard to match.

regards, tom lane

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Heikki Linnakangas


On 16/08/2024 15:55, Robert Haas wrote:

On Thu, Aug 15, 2024 at 6:03 PM Heikki Linnakangas  wrote:

On the default for "max_protocol_version": I'm pretty disappointed if we
cannot change the default to "latest". I realize that that won't work
with poolers that don't implement NegotiateProtocolVersion. But I'm
afraid if we make the new protocol version opt-in, no one will use it,
and the poolers etc still won't bother to implement
NegotiateProtocolVersion for years to come, if ever. We can revisit this
decision later in the release cycle though. But I'd suggest changing the
default to "latest" for now, so that more hackers working with
connection poolers will notice, and we get more testing of the new
protocol and the negotiation.


In this regard, I think your proposed protocol change (bumping the
cancel-key length) is different from all of the other protocol
enhancement proposals that I can think of. Most people seem to be
interested in adding an optional feature that some clients might want
and other clients might not care about. Peter Eisentraut's transparent
column encryption stuff is an example of that. What Jelte wants to do
here is too, really, because while these facilities seem like they
could be generally useful for poolers -- at least if we could agree on
what to do and work out all the problems -- and could potentially be
used by applications as well, there would no requirement that any
particular application use any of the new facilities and many of them
wouldn't. So in that kind of world, it makes more sense to me to
default to 3.0 unless the user indicates a desire to use a newer
feature. That way, we minimize breakage at very little cost. Desire to
use the new features can be expected to spur some development in
ecosystem projects, and until that work gets done, many setups are
unaffected.

But the cancel key is a whole different kind of thing. I don't expect
people to be motivated to add support for newer protocol versions just
to get a longer cancel key. If we want people to adopt longer cancel
keys, we need to change the client default and accept that there's
going to be a bunch of breakage until everybody fixes their code.


Agreed.


But is that actually a good idea?

I have some misgivings about that. When somebody's stuff stops working
because of some problem that boils down to "we made the cancel key
longer," I think we're going to have some pretty irate users. I
believe everybody would agree in a vacuum that making cancel keys
longer is probably a good idea, but it seems like a relatively minor
benefit for the amount of inconvenience we're going to be imposing on
everyone.


Agreed.


On the other hand, the logical conclusion of that argument
is that we should never do it, which I don't really believe either.
I'm actually kind of surprised that nobody else (that I've seen,
anyway) has expressed concern about the fact that that proposal
involves a protocol version bump. Have people not noticed? Does nobody
care? To me, that thread reads like "I'm going to make a fire in the
fireplace" but then there's a footnote that reads "by setting off a
nuclear bomb" but we're only talking about how warm and cozy the fire
will be. :-)

I'm sure you're going to say "it's worth it" -- you wouldn't have
written the patch otherwise -- but I wonder if it's going to feel
worth it to everybody who has to deal with the downstream effects.


I didn't realize the issue with poolers is so widespread when I started 
working on that patch this spring, and I gave up hoping to get it into 
v17 when that was brought up.


Now, I think we should still do it, but it might not warrant changing 
the default. Unfortunately that means that it will get very little 
adoption. It will only be adopted as a side-effect of some other changes 
that make people change the protocol_version setting. Or after a very 
long transition period.


That said, I think we *should* change the default for the time being, so 
that developers working on the bleeding edge and building from git get 
some exposure to it. Hopefully that will nudge some of the poolers to 
adopt NegotiateProtocolVersion sooner. But we revert the default to 3.0 
before the v18 release.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Jelte Fennema-Nio

On Fri, 16 Aug 2024 at 16:51, Heikki Linnakangas  wrote:
> That said, I think we *should* change the default for the time being, so
> that developers working on the bleeding edge and building from git get
> some exposure to it. Hopefully that will nudge some of the poolers to
> adopt NegotiateProtocolVersion sooner. But we revert the default to 3.0
> before the v18 release.

That seems reasonable to me. To be clear, the latest PgBouncer release
now supports NegotiateProtocolVersion.

Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2024-08-16 Thread Tom Lane

Nishant Sharma  writes:
> Actual column names used while creation of foreign table are not allowed to
> be an
> empty string, but when we use column_name as an empty string in OPTIONS
> during
> CREATE or ALTER of foreign tables, it is allowed.

Is this really a bug?  The valid remote names are determined by
whatever underlies the FDW, and I doubt we should assume that
SQL syntax restrictions apply to every FDW.  Perhaps it would
be reasonable to apply such checks locally in SQL-based FDWs,
but I object to assuming such things at the level of
ATExecAlterColumnGenericOptions.

More generally, I don't see any meaningful difference between
this mistake and the more common one of misspelling the remote
column name, which is something we're not going to be able
to check for (at least not in anything like this way).  If
you wanted to move the ease-of-use goalposts materially,
you should be looking for a way to do that.

regards, tom lane

Re: Make query cancellation keys longer

2024-08-16 Thread Robert Haas

On Fri, Aug 16, 2024 at 10:37 AM Heikki Linnakangas  wrote:
> If we envision accepting ranges like that in the future, it would be
> good to do now rather than later. Otherwise, if someone wants to require
> features from protocol 3.2 today, they will have to put
> "protocol_version=3.2" in the connection string, and later when 3.3
> version is released, their connection string will continue to force the
> then-old 3.2 version.

I'm totally cool with doing it now rather than later if you or someone
else is willing to do the work. But I don't see why we'd need a
protocol bump to change it later. If you write protocol_version=3.7 or
protocol_version=3.2-3.7 we send the same thing to the server either
way. It's only a difference in whether we slam the connection shut if
the server comes back and say it can only do 3.0.

> I'll split this patch like that, to make it easier to compare and merge
> with Jelte's corresponding patches.

That sounds great. IMHO, comparing and merging the patches is the next
step here and would be great to see.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Parallel CREATE INDEX for BRIN indexes

2024-08-16 Thread Peter Eisentraut


On 16.08.24 11:22, Tomas Vondra wrote:

These pageinspect tests added a new use of the md5() function.  We got
rid of those in the tests for PG17.  You could write the test case with
something like

  SELECT (CASE WHEN (mod(i,231) = 0) OR (i BETWEEN 3500 AND 4000) THEN
NULL ELSE i END),
-   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN
NULL ELSE md5(i::text) END),
+   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN
NULL ELSE encode(sha256(i::text::bytea), 'hex') END),
     (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3850 AND 4500) THEN
NULL ELSE (i/100) + mod(i,8) END)

But this changes the test output slightly and I'm not sure if this gives
you the data distribution that you need for you test.  Could your check
this please?



I think this is fine. The output only changes because sha256 produces
longer values than md5, so that the summaries are longer the index gets
a page longer. AFAIK that has no impact on the test.


Ok, I have committed that.  Thanks for checking.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Robert Haas

On Fri, Aug 16, 2024 at 10:51 AM Heikki Linnakangas  wrote:
> Now, I think we should still do it, but it might not warrant changing
> the default. Unfortunately that means that it will get very little
> adoption. It will only be adopted as a side-effect of some other changes
> that make people change the protocol_version setting. Or after a very
> long transition period.
>
> That said, I think we *should* change the default for the time being, so
> that developers working on the bleeding edge and building from git get
> some exposure to it. Hopefully that will nudge some of the poolers to
> adopt NegotiateProtocolVersion sooner. But we revert the default to 3.0
> before the v18 release.

I'm fine with changing the default for the time being. I'm not really
sure what I think about changing it back before release. Maybe six
months from now it will be clearer what the right thing to do is; or
maybe other people will be more certain of what The Right Thing To Do
is than I am myself; but there's no rush to finalize a decision right
this minute.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: thread-safety: gmtime_r(), localtime_r()

2024-08-16 Thread Peter Eisentraut


Here is an updated patch version.

I have changed the backend call from localtime() to gmtime() but then 
also to gmtime_r().


I moved the _POSIX_C_SOURCE definition for MinGW from the header file to 
a command-line option (-D_POSIX_C_SOURCE).  This matches the treatment 
of _GNU_SOURCE and similar.


I think this is about as good as it's going to get, and we need it to 
be, so I propose to commit this version if there are no further concerns.
From 80e25d03404a6d13ce726e07548270bf66de2792 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 16 Aug 2024 16:50:19 +0200
Subject: [PATCH v2] thread-safety: gmtime_r(), localtime_r()

Use gmtime_r() and localtime_r() instead of gmtime() and localtime(),
for thread-safety.

There are a few affected calls in libpq and ecpg's libpgtypes, which
are probably effectively bugs, because those libraries already claim
to be thread-safe.

There is one affected call in the backend.  Most of the backend
otherwise uses the custom functions pg_gmtime() and pg_localtime(),
which are implemented differently.

While we're here, change the call in the backend to gmtime*() instead
of localtime*(), since for that use time zone behavior is irrelevant,
and this side-steps any questions about when time zones are
initialized by localtime_r() vs localtime().

Portability: gmtime_r() and localtime_r() are in POSIX but are not
available on Windows.  Windows has functions gmtime_s() and
localtime_s() that can fulfill the same purpose, so we add some small
wrappers around them.  (Note that these *_s() functions are also
different from the *_s() functions in the bounds-checking extension of
C11.  We are not using those here.)

MinGW exposes neither *_r() nor *_s() by default.  You can get at the
POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately
before including .  (There is probably also a way to get at
the Windows-style *_s() functions by supplying some additional options
or defines.  But we might as well just use the POSIX ones.)

Discussion: 
https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7...@eisentraut.org
---
 meson.build|  4 
 src/backend/utils/adt/pg_locale.c  |  3 ++-
 src/include/port/win32_port.h  |  9 +
 src/interfaces/ecpg/pgtypeslib/dt_common.c | 11 +++
 src/interfaces/ecpg/pgtypeslib/timestamp.c |  3 ++-
 src/interfaces/libpq/fe-trace.c|  3 ++-
 src/template/win32 |  3 +++
 7 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/meson.build b/meson.build
index cd711c6d018..ea07126f78e 100644
--- a/meson.build
+++ b/meson.build
@@ -268,6 +268,10 @@ elif host_system == 'windows'
   exesuffix = '.exe'
   dlsuffix = '.dll'
   library_path_var = ''
+  if cc.get_id() != 'msvc'
+# define before including  for getting localtime_r() etc. on MinGW
+cppflags += '-D_POSIX_C_SOURCE'
+  endif
 
   export_file_format = 'win'
   export_file_suffix = 'def'
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index cd3661e7279..0c3d0c7b5b8 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -826,6 +826,7 @@ cache_locale_time(void)
char   *bufptr;
time_t  timenow;
struct tm  *timeinfo;
+   struct tm   timeinfobuf;
boolstrftimefail = false;
int encoding;
int i;
@@ -876,7 +877,7 @@ cache_locale_time(void)
 
/* We use times close to current time as data for strftime(). */
timenow = time(NULL);
-   timeinfo = localtime(&timenow);
+   timeinfo = gmtime_r(&timenow, &timeinfobuf);
 
/* Store the strftime results in MAX_L10N_DATA-sized portions of buf[] 
*/
bufptr = buf;
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 7ffe5891c69..27ce90ecc3d 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -420,6 +420,15 @@ extern int _pglstat64(const char *name, struct stat *buf);
  */
 #define strtok_r strtok_s
 
+/*
+ * Supplement to .
+ */
+#ifdef _MSC_VER
+/* MinGW has these functions already */
+#define gmtime_r(clock, result) (gmtime_s(result, clock) ? NULL : (result))
+#define localtime_r(clock, result) (localtime_s(result, clock) ? NULL : 
(result))
+#endif
+
 /*
  * Locale stuff.
  *
diff --git a/src/interfaces/ecpg/pgtypeslib/dt_common.c 
b/src/interfaces/ecpg/pgtypeslib/dt_common.c
index ed08088bfe1..48074fd3ad1 100644
--- a/src/interfaces/ecpg/pgtypeslib/dt_common.c
+++ b/src/interfaces/ecpg/pgtypeslib/dt_common.c
@@ -949,9 +949,10 @@ int
 GetEpochTime(struct tm *tm)
 {
struct tm  *t0;
+   struct tm   tmbuf;
time_t  epoch = 0;
 
-   t0 = gmtime(&epoch);
+   t0 = gmtime_r(&epoch, &tmbuf);
 
if (t0)
{
@@ -973,12 +974,13 @@ abstime2tm(AbsoluteTime _time, int *tzp, struct tm *tm, 
char **tz

Re: Remove dependence on integer wrapping

2024-08-16 Thread Nathan Bossart

On Thu, Aug 15, 2024 at 10:49:46PM -0400, Joseph Koshakow wrote:
> This updated version LGTM, I agree it's a good use of pg_abs_s32().

Committed.

-- 
nathan

Re: First draft of PG 17 release notes

2024-08-16 Thread Bruce Momjian

On Wed, Jul 17, 2024 at 03:32:45PM +0900, Kisoon Kwon wrote:
> Hi,
> 
> In the PG17 release notes, I noticed it is mentioned as
> "pg_attribute.stxstattarget" which seems incorrect.
> In my opinion, it should be "pg_statistic_ext.stxstattarget" because the
> "stxstattarget" column is part of the "pg_statistic_ext" catalog.

You are right, fixed in the attached patch.  Sorry for the delay.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index e9cee92ff44..dfbf8a74cd4 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -269,7 +269,7 @@ Author: Peter Eisentraut 
  Change pg_attribute.attstattarget
  and
- pg_attribute.stxstattarget
+ pg_statistic_ext.stxstattarget
  to represent the default statistics target as NULL
  (Peter Eisentraut)

Re: First draft of PG 17 release notes

2024-08-16 Thread Bruce Momjian

On Fri, Jul 26, 2024 at 01:22:24PM +0900, Yugo Nagata wrote:
> I found the following in the release notes:
> 
>  Change file boundary handling of two WAL file name functions 
>  (Kyotaro Horiguchi, Andres Freund, Bruce Momjian)
> 
>  The functions pg_walfile_name() and pg_walfile_name_offset() used to report 
> the previous 
>  LSN segment number when  the LSN was on a file segment boundary; it now 
> returns the LSN segment. 
> 
> It might be trivial, but, reading the associated commit message , I think it 
> would be more explicit
> for users to rewrite the last statement to
> 
> "it now returns the current LSN segment."

Agreed, applied patch attached.  Sorry for the delay.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index dfbf8a74cd4..8a1e51f10d6 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -176,7 +176,7 @@ Author: Bruce Momjian 
  and pg_walfile_name_offset() used to report
  the previous LSN segment number when the
  LSN was on a file segment boundary;  it now
- returns the LSN segment.
+ returns the current LSN segment.

Re: First draft of PG 17 release notes

2024-08-16 Thread Bruce Momjian

On Thu, Aug  8, 2024 at 08:55:53AM -0500, Justin Pryzby wrote:
> > Add server variable huge_page_size to report the use of huge pages by
> 
> The new variable is huge_page_status; h_p_size is several years old.

Fixed.  I created this mistake when I was adding links to the SGML file.

> BTW, I was surprised that these were included:
> 
> +2024-02-28 [363eb0599] Convert README to Markdown.
> +2024-01-25 [7014c9a4b] Doc: improve documentation for jsonpath behavior.

I try to mention significant doc changes, and have done so in the past.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread Maciek Sakrejda

On Wed, Aug 14, 2024 at 3:40 PM Jelte Fennema-Nio  wrote:
>
> I'd like to send an automatic mail to a thread whenever it gets added
> to a commitfest. Since this would impact everyone that's subscribed to
> the mailinglist I'd like some feedback on this. This mail would
> include:
>
> 1. A very short blurb like: "This thread was added to the commitfest
> with ID 1234"
> 2. A link to the commitfest entry
> 3. A link to the cfbot CI runs
> 4. A link to the diff on GitHub
> 5. Any other suggestions?

I would find (2) very useful.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Jacob Champion

On Fri, Aug 16, 2024 at 12:05 AM Jelte Fennema-Nio  wrote:
>
> On Fri, 16 Aug 2024 at 00:39, Jacob Champion
>  wrote:
> >
> > On Thu, Aug 15, 2024 at 3:04 PM Heikki Linnakangas  wrote:
> > > Perhaps we should even change it to return
> > > 30 for protocol version 3.0, and just leave a note in the docs like
> > > "in older versions of libpq, this returned 3 for protocol version 3.0".
> >
> > I think that would absolutely break current code. It's not uncommon
> > (IME) for hand-built clients wrapping libpq to make sure they're not
> > talking v2 before turning on some feature, and they're allowed to do
> > that with a PQprotocolVersion() == 3 check. A GitHub code search
> > brings up examples.
>
> Can you give a link for that code search and/or an example where
> someone used it like that in a real setting?

Keeping in mind that I'm responding to the change from 3 to 3:

https://github.com/search?q=PQprotocolVersion&type=code

https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89

Bindings re-export this symbol in ways that basically just expand the
web of things to talk about. And there's hazards like

https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864

where the old client is fine, but new clients could be tricked into
writing similar checks as `>= 3` -- which is wrong because older
libpqs use 3, haha, surprise, have fun with that!

> The only example I could
> find where someone used it at all was psycopg having a unittest for
> their python wrapper around this API, and they indeed used == 3.

I've written code that uses exact equality as well, because I cared
about the wire protocol version. Even if I hadn't, isn't the first
public example enough, since a GitHub search is going to be an
undercount? What is your acceptable level of breakage?

People who are testing against this have some reason to care about the
underlying protocol compatibility. I don't need to understand (or even
agree with!) why they care; we've exported an API that allows them to
do something they find useful.

> The advantage is not introducing yet another API when we already have
> one with a great name

Sorry to move the goalposts, but when I say "value" I'm specifically
talking about value for clients/community, not value for patch writers
(or even maintainers). A change here doesn't add any value for
existing clients when compared to a new API, since they've already
written the version check code and are presumably happy with it. New
clients that have reason to care about the minor version, whatever
that happens to mean for a protocol, can use new code.

I'm not opposed to compatibility breaks when a client can see some
value in what we've done -- but the dev being broken should at least
be able to say something like "oh yeah, it was clearly broken before
and I'm glad it works now" or "wow, what a security hole, I'm glad
they patched it". That's not true here.

libpq is close to the base of a gigantic software stack and ecosystem.
We have an API, we have an SONAME, we have ways to introduce better
APIs while not breaking past clients. (And we can collect the list of
cruft to discard in a future SONAME bump, so we don't have to carry it
forever... but is the cost of this particularly large?)

> that no-one is currently using.

This is demonstrably false. You just cited the example you found in
psycopg, and all those bindings on GitHub have clients of their own,
not all of which are going to be easily searchable.

> The current API
> is in practice just a very convoluted way of writing 3.

There are versions of libpq still in support that can return 2, and
clients above us have to write code against the whole spectrum.

> Also doing an
> == 3 check is obviously problematic, if people use this function they
> should be using > 3 to be compatible with future versions.

Depends on why they're checking. I regularly write test clients that
drop down beneath the libpq layer. I don't want to be silently
upgraded.

I think I remember some production Go-based libpq clients several
years ago that did similar things, dropping down to the packet level
to do some sort of magic, but I can't remember exactly what now.
That's terrifying in the abstract, but it's not my decision or code to
maintain. The community is allowed to do things like that; we publish
a protocol definition in addition to an API that exposes the socket.

> So if we
> ever introduce protocol version 4, then these (afaict theoretical)
> users would break anyway.

Yes -- not theoretical, since I am one of them! -- that's the point.
Since we've already demonstrated that protocol details can leak up
above the API for the 2->3 change, a dev with reason to be paranoid
(such as myself) can write a canary for the 3->4 change. "Protocol 4.0
not yet supported" can be a feature.

--Jacob

Re: Remove dependence on integer wrapping

2024-08-16 Thread Alexander Lakhin


Hello Nathan and Joe,

16.08.2024 19:52, Nathan Bossart wrote:

On Thu, Aug 15, 2024 at 10:49:46PM -0400, Joseph Koshakow wrote:

This updated version LGTM, I agree it's a good use of pg_abs_s32().

Committed.


Thank you for working on that issue!

I've tried `make check` with CC=gcc-13 CPPFLAGS="-O0 -ftrapv" and got a
server crash:
2024-08-16 17:14:36.102 UTC postmaster[1982703] LOG:   (PID 1983867) was 
terminated by signal 6: Aborted
2024-08-16 17:14:36.102 UTC postmaster[1982703] DETAIL:  Failed process was running: 
select '[]'::jsonb ->> -2147483648;
with the stack trace
...
#5  0x556aec224a11 in __negvsi2 ()
#6  0x556aec046238 in jsonb_array_element_text (fcinfo=0x556aedd70240) at 
jsonfuncs.c:993
#7  0x556aebc90b68 in ExecInterpExpr (state=0x556aedd70160, 
econtext=0x556aedd706a0, isnull=0x7ffdf82211e4)
    at execExprInterp.c:765
...
(gdb) f 6
#6  0x556aec046238 in jsonb_array_element_text (fcinfo=0x556aedd70240) at 
jsonfuncs.c:993
993 if (-element > nelements)
(gdb) p element
$1 = -2147483648

Sp it looks like jsonb_array_element_text() still needs the same
treatment as jsonb_array_element().

Moreover, I tried to use "-ftrapv" on 32-bit Debian and came across
another failure:
select '9223372036854775807'::int8 * 2147483648::int8;
server closed the connection unexpectedly
...
#4  0xb76a in __GI_abort () at ./stdlib/abort.c:79
#5  0x004cb2e1 in __mulvdi3.cold ()
#6  0x00abe7ab in pg_mul_s64_overflow (a=9223372036854775807, b=2147483648, 
result=0xbff1da68)
    at ../../../../src/include/common/int.h:264
#7  0x00abfbff in int8mul (fcinfo=0x14d9d04) at int8.c:496
#8  0x00782675 in ExecInterpExpr (state=0x14d9c4c, econtext=0x14da15c, 
isnull=0xbff1dc3f) at execExprInterp.c:765

Whilst
select '9223372036854775807'::int8 * 2147483647::int8;
emits
ERROR:  bigint out of range

I've also discovered another trap-triggering case for a 64-bit platform:
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1;

server closed the connection unexpectedly
...
#5  0x5576cfb1c9f3 in __negvdi2 ()
#6  0x5576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at 
bitmapset.c:691
#7  0x5576cf72be0f in fix_append_rel_relids (root=0x5576d09df198, varno=31, 
subrelids=0x5576d09f7fb0)
    at prepjointree.c:3830
#8  0x5576cf7278c2 in pull_up_simple_subquery (root=0x5576d09df198, 
jtnode=0x5576d09f7470, rte=0x5576d09de300,
    lowest_outer_join=0x0, containing_appendrel=0x5576d09f7368) at 
prepjointree.c:1277
...
(gdb) f 6
#6  0x5576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at 
bitmapset.c:691
691 if (result >= 0 || HAS_MULTIPLE_ONES(w))
(gdb) p/x w
$1 = 0x8000

Best regards,
Alexander

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Robert Haas

On Fri, Aug 16, 2024 at 1:44 PM Jacob Champion
 wrote:
> 
> https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
> 
> https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864

IMHO these examples establish beyond doubt that the existing function
really is being used in ways that would break if we committed the
proposed patch. To be honest, I'm slightly surprised, because protocol
version 2 has been so dead for so long that I would not have
anticipated people would even bother checking for it. But these
examples show that some people do. If Jacob found these examples this
easily, there are probably a bunch of others.

It's not worth breaking existing code to avoid adding one new libpq
entrypoint. Let's just add the new function and move on.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread Jacob Champion

On Fri, Aug 16, 2024 at 10:23 AM Maciek Sakrejda  wrote:
> > 1. A very short blurb like: "This thread was added to the commitfest
> > with ID 1234"
> > 2. A link to the commitfest entry
> > 3. A link to the cfbot CI runs
> > 4. A link to the diff on GitHub
> > 5. Any other suggestions?
>
> I would find (2) very useful.

Me too.

--Jacob

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread Peter Geoghegan

On Thu, Aug 15, 2024 at 9:33 AM Peter Eisentraut  wrote:
> But a more serious concern here is that the patches created by the cfbot
> are not canonical.  There are various heuristics when they get applied.

It's true that the code required for CFBot to simply apply a patch is
nontrivial. We're accounting for various edge-cases, and soldiering
on, where we deem that it makes sense. I'm referring to those cases
where "git am" won't work perfectly, but we have a pretty good chance
of successfully generating what the patch author intended (I assume
that that's what you meant by "heuristics").

One reason why this works better than might be expected is
because...we then test the result (that's the whole point, of course).
Obviously, if we apply a patch in a way that isn't quite perfectly
clean (according to whatever the "git am" criteria is), but that CFBot
is nevertheless capable of applying, and we then find that the end
result passes all tests, that gives us a fairly strong signal. We can
have high confidence that CFBot has done the right thing at that
point. We can reasonably present the resulting feature branch as a
"known good" usable version of the patch.

Of course you can quibble with this. Fundamentally, a patch file can
never be updated, but we want to apply it on top of a moving target
(as best we can). We're always making some kind of trade-off. I just
don't think that the heuristics that humans might choose to apply are
necessarily much better on average. Humans are bad at routine boring
things, but good at noticing and coping with special cases.

> I would prefer that people work with the actual patches sent by email,
> at least unless they know exactly what they are doing.  We don't want to
> create parallel worlds of patches that are like 90% similar but not
> really identical.

There's a reason why tech companies place so much emphasis on offering
a "low friction experience". The aggregate effect on user/consumer
behavior is huge. I'm not saying that this is good or bad (let's not
get into that now); just that it is an empirical fact that people tend
to behave like that. We want more reviewers. Why not try to meet
people where they are a bit more?

I have to admit that I think that I'd be far more likely to quickly
test out a patch if I'm presented with a workflow that makes the setup
as painless as possible. Particularly if I'm all but guaranteed to get
a working Postgres server with the patch applied (which is possible
when I'm building exactly the same feature branch as the one that
passed CI by CFBot). I'm entirely willing to believe that it wouldn't
work that way for you, but I'm confident that a lot of people are in
the same camp as me.

-- 
Peter Geoghegan

Re: Remove dependence on integer wrapping

2024-08-16 Thread Nathan Bossart

On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote:
> Sp it looks like jsonb_array_element_text() still needs the same
> treatment as jsonb_array_element().

D'oh.  I added a test for that but didn't actually fix the code.  I think
we just need something like the following.

diff --git a/src/backend/utils/adt/jsonfuncs.c 
b/src/backend/utils/adt/jsonfuncs.c
index 1f8ea51e6a..69cdd84393 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -990,7 +990,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
 {
 uint32  nelements = JB_ROOT_COUNT(jb);

-if (-element > nelements)
+if (pg_abs_s32(element) > nelements)
 PG_RETURN_NULL();
 else
 element += nelements;

> Moreover, I tried to use "-ftrapv" on 32-bit Debian and came across
> another failure:
> select '9223372036854775807'::int8 * 2147483648::int8;
> server closed the connection unexpectedly
> ...
> #4  0xb76a in __GI_abort () at ./stdlib/abort.c:79
> #5  0x004cb2e1 in __mulvdi3.cold ()
> #6  0x00abe7ab in pg_mul_s64_overflow (a=9223372036854775807, b=2147483648, 
> result=0xbff1da68)
>     at ../../../../src/include/common/int.h:264
> #7  0x00abfbff in int8mul (fcinfo=0x14d9d04) at int8.c:496
> #8  0x00782675 in ExecInterpExpr (state=0x14d9c4c, econtext=0x14da15c, 
> isnull=0xbff1dc3f) at execExprInterp.c:765

Hm.  It looks like that is pointing to __builtin_mul_overflow(), which
seems strange.

> #6  0x5576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at 
> bitmapset.c:691
> 691 if (result >= 0 || HAS_MULTIPLE_ONES(w))

At a glance, this appears to be caused by the RIGHTMOST_ONE macro:

#define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) 
(x)))

-- 
nathan

Re: Remove dependence on integer wrapping

2024-08-16 Thread Nathan Bossart

On Fri, Aug 16, 2024 at 01:35:01PM -0500, Nathan Bossart wrote:
> On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote:
>> #6  0x5576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at 
>> bitmapset.c:691
>> 691 if (result >= 0 || HAS_MULTIPLE_ONES(w))
> 
> At a glance, this appears to be caused by the RIGHTMOST_ONE macro:
> 
>   #define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) 
> (x)))

I believe hand-rolling the two's complement calculation should be
sufficient to avoid depending on -fwrapv here.  godbolt.org indicates that
it produces roughly the same code, too.

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index cd05c642b0..d37a997c0e 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -67,7 +67,7 @@
  * we get zero.
  *--
  */
-#define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x)))
+#define RIGHTMOST_ONE(x) ((bitmapword) (x) & (~((bitmapword) (x)) + 1))

 #define HAS_MULTIPLE_ONES(x)((bitmapword) RIGHTMOST_ONE(x) != (x))

-- 
nathan

Re: macOS prefetching support

2024-08-16 Thread Peter Eisentraut


On 14.08.24 16:39, Peter Eisentraut wrote:

On 14.08.24 14:36, Thomas Munro wrote:
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut 
 wrote:

Attached is a patch to implement this.  It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.


Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__.  I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool.  I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:

maintenance_io_concurrency=0,  ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE:  652ms
maintenance_io_concurrency=25, ANALYZE:  389ms

It works!


Cool!  I'll work on a more polished patch.


Here it is.

Some interesting questions:

What to do about the order of the symbols and include files.  I threw 
something into src/include/port/darwin.h, but I'm not sure if that's 
good.  Alternatively, we could not use __darwin__ but instead the more 
standard and predefined defined(__APPLE__) && defined(__MACH__).


How to document it.  The current documentation makes references mainly 
to the availability of posix_fadvise().  That seems quite low-level. 
How could a user of a prepared package even find out about that?  Should 
we just say "requires OS support" (kind of like I did here) and you can 
query the effective state by looking at the *_io_concurrency settings? 
Or do we need a read-only parameter that shows whether prefetch support 
exists (kind of along the lines of huge pages)?


Btw., for context, here is what I gather the prefetch support (with this 
patch) is:


cygwin  posix_fadvise
darwin  fcntl
freebsd posix_fadvise
linux   posix_fadvise
netbsd  posix_fadvise
openbsd no
solaris fake
win32   no

(There is also the possibility that we provide an implementation of 
posix_fadvise() for macOS that wraps the platform-specific code in this 
patch.  And then Apple could just take that. ;-) )
From d318c4108b9e05a4828f6e7f71af34c3ca89b3ed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 16 Aug 2024 20:15:43 +0200
Subject: [PATCH v2] Add prefetching support on macOS

macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE
command does the same thing.

Discussion: 
https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org
---
 doc/src/sgml/config.sgml| 14 +++-
 doc/src/sgml/wal.sgml   |  4 +--
 src/backend/commands/variable.c |  4 +--
 src/backend/storage/file/fd.c   | 59 ++---
 src/include/pg_config_manual.h  |  7 ++--
 src/include/port/darwin.h   |  5 +++
 6 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2937384b001..c6d2fa2148e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2679,11 +2679,10 @@ Asynchronous Behavior
 
 
 
- Asynchronous I/O depends on an effective 
posix_fadvise
- function, which some operating systems lack.  If the function is not
- present then setting this parameter to anything but zero will result
- in an error.  On some operating systems (e.g., Solaris), the function
- is present but does not actually do anything.
+ Asynchronous I/O depends on an effective support by the operating
+ system, which some operating systems lack.  If there is no operating
+ system support then setting this parameter to anything but zero will
+ result in an error.
 
 
 
@@ -3852,10 +3851,7 @@ Recovery
 off, on and
 try (the default).  The setting
 try enables
-prefetching only if the operating system provides the
-posix_fadvise function, which is currently used
-to implement prefetching.  Note that some operating systems provide the
-function, but it doesn't do anything.
+prefetching only if the operating system provides prefetching support.


 Prefetching blocks that will soon be needed can reduce I/O wait times
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d5df65bc693..72b73dbf113 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -841,8 +841,8 @@ WAL Configuration
The  and
 settings limit prefetching
concurrency and distance, respectively.  By default, it is set to
-   try, which enables the feature on systems where
-   posix_fadvise is available.
+   try, which enables the feature on systems that have
+   prefetching support.
   
  
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 6202c5ebe44..c1c6c2811c9 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1212,7 +12

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Heikki Linnakangas


On 16/08/2024 21:01, Robert Haas wrote:

On Fri, Aug 16, 2024 at 1:44 PM Jacob Champion
 wrote:

 
https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
 
https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864


IMHO these examples establish beyond doubt that the existing function
really is being used in ways that would break if we committed the
proposed patch. To be honest, I'm slightly surprised, because protocol
version 2 has been so dead for so long that I would not have
anticipated people would even bother checking for it. But these
examples show that some people do. If Jacob found these examples this
easily, there are probably a bunch of others.

It's not worth breaking existing code to avoid adding one new libpq
entrypoint. Let's just add the new function and move on.


+1. Jacob is right.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Dave Cramer

On Fri, 16 Aug 2024 at 15:26, Heikki Linnakangas  wrote:

> On 16/08/2024 21:01, Robert Haas wrote:
> > On Fri, Aug 16, 2024 at 1:44 PM Jacob Champion
> >  wrote:
> >>
> https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
> >>
> https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864
> >
> > IMHO these examples establish beyond doubt that the existing function
> > really is being used in ways that would break if we committed the
> > proposed patch. To be honest, I'm slightly surprised, because protocol
> > version 2 has been so dead for so long that I would not have
> > anticipated people would even bother checking for it. But these
> > examples show that some people do. If Jacob found these examples this
> > easily, there are probably a bunch of others.
> >
> > It's not worth breaking existing code to avoid adding one new libpq
> > entrypoint. Let's just add the new function and move on.
>
> +1. Jacob is right.
>

For those of us who don't use a function. How will this work ?

Dave

Re: pg_verifybackup: TAR format backup verification

2024-08-16 Thread Robert Haas

On Wed, Aug 14, 2024 at 12:44 PM Robert Haas  wrote:
> On Wed, Aug 14, 2024 at 9:20 AM Amul Sul  wrote:
> > I agree with keeping verify_backup_file() separate, but I'm hesitant
> > about doing the same for verify_backup_directory().
>
> I don't have time today to go through your whole email or re-review
> the code, but I plan to circle back to that at a later time, However,
> I want to respond to this point in the meanwhile.

I have committed 0004 (except that I changed a comment) and 0005
(except that I didn't move READ_CHUNK_SIZE).

Looking at the issue mentioned above again, I agree that the changes
in verify_backup_directory() in this version don't look overly
invasive in this version. I'm still not 100% convinced it's the right
call, but it doesn't seem bad.

You have a spurious comment change to the header of verify_plain_backup_file().

I feel like the naming of tarFile and tarFiles is not consistent with
the overall style of this file.

I don't like this:

[robert.haas ~]$ pg_verifybackup btar
pg_verifybackup: error: pg_waldump does not support parsing WAL files
from a tar archive.
pg_verifybackup: hint: Try "pg_verifybackup --help" to skip parse WAL
files option.

The hint seems like it isn't really correct grammar, and I also don't
see why we can't be more specific. How about "You must use -n,
--no-parse-wal when verifying a tar-format backup."?

The primary message seems a bit redundant, because parsing WAL files
is the only thing pg_waldump does. How about "pg_waldump cannot read
from a tar archive"? Note that primary error messages should not end
in a period (while hint and detail messages should).

+int64 num = strtoi64(relpath, &suffix, 10);

+if (suffix == NULL || (num <= 0) || (num > OID_MAX))

Seems like too many parentheses.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Heikki Linnakangas


On 16/08/2024 22:45, Dave Cramer wrote:
On Fri, 16 Aug 2024 at 15:26, Heikki Linnakangas > wrote:


On 16/08/2024 21:01, Robert Haas wrote:
 > On Fri, Aug 16, 2024 at 1:44 PM Jacob Champion
 > mailto:jacob.champ...@enterprisedb.com>> wrote:
 >>

https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
 

 >>

https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864
 

 >
 > IMHO these examples establish beyond doubt that the existing function
 > really is being used in ways that would break if we committed the
 > proposed patch. To be honest, I'm slightly surprised, because
protocol
 > version 2 has been so dead for so long that I would not have
 > anticipated people would even bother checking for it. But these
 > examples show that some people do. If Jacob found these examples this
 > easily, there are probably a bunch of others.
 >
 > It's not worth breaking existing code to avoid adding one new libpq
 > entrypoint. Let's just add the new function and move on.

+1. Jacob is right.


For those of us who don't use a function. How will this work ?


Sorry, I don't understand the question. This sub-thread is all about the 
libpq PQprotocolVersion() function.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Dave Cramer

On Fri, 16 Aug 2024 at 15:54, Heikki Linnakangas  wrote:

> On 16/08/2024 22:45, Dave Cramer wrote:
> > On Fri, 16 Aug 2024 at 15:26, Heikki Linnakangas  > > wrote:
> >
> > On 16/08/2024 21:01, Robert Haas wrote:
> >  > On Fri, Aug 16, 2024 at 1:44 PM Jacob Champion
> >  >  > > wrote:
> >  >>
> >
> https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
> <
> https://github.com/psycopg/psycopg2/blob/658afe4cd90d3e167d7c98d22824a8d6ec895b1c/tests/test_async.py#L89
> >
> >  >>
> >
> https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864
> <
> https://github.com/infusion/PHP/blob/7ebefb6426bb4b4820a30cca5c3a10bfd757b6ea/ext/pgsql/pgsql.c#L864
> >
> >  >
> >  > IMHO these examples establish beyond doubt that the existing
> function
> >  > really is being used in ways that would break if we committed the
> >  > proposed patch. To be honest, I'm slightly surprised, because
> > protocol
> >  > version 2 has been so dead for so long that I would not have
> >  > anticipated people would even bother checking for it. But these
> >  > examples show that some people do. If Jacob found these examples
> this
> >  > easily, there are probably a bunch of others.
> >  >
> >  > It's not worth breaking existing code to avoid adding one new
> libpq
> >  > entrypoint. Let's just add the new function and move on.
> >
> > +1. Jacob is right.
> >
> >
> > For those of us who don't use a function. How will this work ?
>
> Sorry, I don't understand the question. This sub-thread is all about the
> libpq PQprotocolVersion() function.
>

Admittedly I'm a bit late into this discussion so I may be off base.
Ultimately we need to negotiate the protocol. From what I can tell for
libpq we are providing a function that returns a number, currently 3.

The proposal is to change it to something like 3.

Ultimately this has to go over the wire so that clients that are
implementing the protocol themselves can respond to the new behaviour.

Wouldn't we have to send this number in the protocol negotiation ?

Dave

>
>

Re: pg_verifybackup: TAR format backup verification

2024-08-16 Thread Robert Haas

On Fri, Aug 16, 2024 at 3:53 PM Robert Haas  wrote:
> +int64 num = strtoi64(relpath, &suffix, 10);

Hit send too early. Here, seems like this should be strtoul(), not strtoi64().

The documentation of --format seems to be cut-and-pasted from
pg_basebackup and the language isn't really appropriate here. e.g.
"The main data directory's contents will be written to a file
named..." but pg_verifybackup writes nothing.

+simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");

Why not make the same logic that recognizes base or an OID also
recognize pg_wal as a prefix, and identify that as the WAL archive?
For now we'll have to skip it, but if you do it that way then if we
add future support for more suffixes, it'll just work, whereas this
way won't. And you'd need that code anyway if we ever can run
pg_waldump on a tarfile, because you would need to identify the
compression method. Note that the danger of the list of suffixes
getting out of sync here is not hypothetical: you added .tgz elsewhere
but not here.

There's probably more to look at here but I'm running out of energy for today.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Remove dependence on integer wrapping

2024-08-16 Thread Nathan Bossart

On Fri, Aug 16, 2024 at 01:35:01PM -0500, Nathan Bossart wrote:
> On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote:
>> Sp it looks like jsonb_array_element_text() still needs the same
>> treatment as jsonb_array_element().
> 
> D'oh.  I added a test for that but didn't actually fix the code.  I think
> we just need something like the following.
> 
> diff --git a/src/backend/utils/adt/jsonfuncs.c 
> b/src/backend/utils/adt/jsonfuncs.c
> index 1f8ea51e6a..69cdd84393 100644
> --- a/src/backend/utils/adt/jsonfuncs.c
> +++ b/src/backend/utils/adt/jsonfuncs.c
> @@ -990,7 +990,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
>  {
>  uint32  nelements = JB_ROOT_COUNT(jb);
> 
> -if (-element > nelements)
> +if (pg_abs_s32(element) > nelements)
>  PG_RETURN_NULL();
>  else
>  element += nelements;

I've committed this one.

-- 
nathan

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-16 Thread Robert Haas

On Fri, Aug 16, 2024 at 4:03 PM Dave Cramer  wrote:
> Admittedly I'm a bit late into this discussion so I may be off base.
> Ultimately we need to negotiate the protocol. From what I can tell for libpq 
> we are providing a function that returns a number, currently 3.
>
> The proposal is to change it to something like 3.
>
> Ultimately this has to go over the wire so that clients that are implementing 
> the protocol themselves can respond to the new behaviour.
>
> Wouldn't we have to send this number in the protocol negotiation ?

See the discussion of the NegotiateProtocolVersion message which has
been around for a long time but is still not supported by all clients.

https://www.postgresql.org/docs/current/protocol.html
https://www.postgresql.org/docs/current/protocol-message-formats.html

No changes to the format of that message are proposed. The startup
message also contains a version number, and changes the format of that
message are also not proposed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: thread-safety: gmtime_r(), localtime_r()

2024-08-16 Thread Thomas Munro

On Sat, Aug 17, 2024 at 3:43 AM Peter Eisentraut  wrote:
> I moved the _POSIX_C_SOURCE definition for MinGW from the header file to
> a command-line option (-D_POSIX_C_SOURCE).  This matches the treatment
> of _GNU_SOURCE and similar.

I was trying to figure out what else -D_POSIX_C_SOURCE does to MinGW.
Enables __USE_MINGW_ANSI_STDIO, apparently, but I don't know if we
were using that already, or if it matters.  I suppose if it ever shows
up as a problem, we can explicitly disable it.

. o O ( MinGW is a strange beast.  Do we want to try to keep the code
it runs as close as possible to what is used by MSVC?  I thought so,
but we can't always do that due to missing interfaces (though I
suspect that many #ifdef _MSC_VER tests are based on ancient versions
and now bogus).  But it also offers ways to be more POSIX-y if we
want, and then we have to decide whether to take them, and make it
more like a separate platform with different quirks... )

> I think this is about as good as it's going to get, and we need it to
> be, so I propose to commit this version if there are no further concerns.

LGTM.

Re: macOS prefetching support

2024-08-16 Thread Thomas Munro

On Sat, Aug 17, 2024 at 6:58 AM Peter Eisentraut  wrote:
> What to do about the order of the symbols and include files.  I threw
> something into src/include/port/darwin.h, but I'm not sure if that's
> good.  Alternatively, we could not use __darwin__ but instead the more
> standard and predefined defined(__APPLE__) && defined(__MACH__).

Hmm.  fd.h and fd.c test for F_NOCACHE, which is pretty closely
related.  Now I'm wondering why we actually need this in
pg_config_manual.h at all.  Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0?  Can
we just teach fd.h to define USE_PREFETCH if
defined(POSIX_FADV_WILLNEED) || defined(F_RDADVISE)?

(I have also thought multiple times about removing the configure
probes for F_FULLFSYNC, and just doing #ifdef.  Oh, that's in my patch
for CF #4453.)

> How to document it.  The current documentation makes references mainly
> to the availability of posix_fadvise().  That seems quite low-level.
> How could a user of a prepared package even find out about that?  Should
> we just say "requires OS support" (kind of like I did here) and you can
> query the effective state by looking at the *_io_concurrency settings?
> Or do we need a read-only parameter that shows whether prefetch support
> exists (kind of along the lines of huge pages)?

I think that's fine.  I don't really like the word "prefetch", could
mean many different things.  What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?  And yeah the GUCs being nailed to zero means you
don't have it.

> (There is also the possibility that we provide an implementation of
> posix_fadvise() for macOS that wraps the platform-specific code in this
> patch.  And then Apple could just take that. ;-) )

Yeah might be simpler... as long as we make sure that we test for
having the function AND the relevant POSIX_FADV_xxx in places, which I
see that we do.

Re: macOS prefetching support

2024-08-16 Thread Tom Lane

Thomas Munro  writes:
> Hmm.  fd.h and fd.c test for F_NOCACHE, which is pretty closely
> related.  Now I'm wondering why we actually need this in
> pg_config_manual.h at all.  Who would turn it off at compile time, and
> why would they not be satisfied with setting relevant GUCs to 0?

+1 for not bothering with a pg_config_manual.h knob.

regards, tom lane

Re: Ineffective Assert-check in CopyMultiInsertInfoNextFreeSlot()

2024-08-16 Thread David Rowley

On Fri, 16 Aug 2024 at 23:48, Amul Sul  wrote:
> The Assert(buffer != NULL) is placed after the buffer is accessed,
> which could lead to a segmentation fault before the check is executed.

Yeah, that's not great. Technically the Assert does not add any value
in terms of catching bugs in the code, but it's probably useful to
keep it for code documentation purposes. A crash would occur even if
the Assert wasn't there.

> Attached a small patch to correct that.

Pushed.

David

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-16 Thread David Rowley

On Thu, 15 Aug 2024 at 10:40, Jelte Fennema-Nio  wrote:
>
> I'd like to send an automatic mail to a thread whenever it gets added
> to a commitfest. Since this would impact everyone that's subscribed to
> the mailinglist I'd like some feedback on this. This mail would
> include:
>
> 1. A very short blurb like: "This thread was added to the commitfest
> with ID 1234"
> 2. A link to the commitfest entry

One thing I have seen which #2 would have helped with was that during
a CF, a patch author posted a trivial patch to -hackers and I
committed it. I didn't know the patch author had added a CF entry to
the *next* CF. I seldom think to look at the next CF page during a CF.
The entry was closed by the patch author in this case.

For that reason, I think #2 would be useful.

> 3. A link to the cfbot CI runs
> 4. A link to the diff on GitHub
> 5. Any other suggestions?

I don't really object, but I think a better aim would be to merge
CFbot with CF so that we could get to those places from the CF entry.

David

Re: Streaming I/O, vectored I/O (WIP)

2024-08-16 Thread Bruce Momjian

On Wed, Jul 10, 2024 at 07:21:59PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> It seems that Heikki's 'v9.heikki-0007-Trivial-comment-fixes.patch'
> [1] is partially applied, the top comment is not updated. The attached
> patch just updates it.
> 
> [1] 
> https://www.postgresql.org/message-id/289a1c0e-8444-4009-a8c2-c2d77ced6f07%40iki.fi

Thanks, patch applied to master.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: macOS prefetching support

2024-08-16 Thread Thomas Munro

On Sat, Aug 17, 2024 at 6:58 AM Peter Eisentraut  wrote:
> solaris fake

I'm half tempted to suggest that we take this exception out.  If it's
there, we call it.  It doesn't do anything right now, but it's a cheap
empty user space function, and I heard they are thinking about adding
a real syscall.  (In a burst of enthusiasm for standards and stuff, I
talked to people involved in all the OSes using ZFS, to suggest
hooking that up; they added some other stuff I asked about so I think
it's a real threat.)  But I haven't noticed any users on this list in
many years, to opine either way.

It doesn't do anything on Cygwin either.  Actually, it's worse than
nothing, it looks like it makes two useless system calls adjusting the
"sequential" flag on the file handle. But really, of the 3 ways to
build on Windows, only MSVC has real users, so it makes no useless
system calls at all, so I don't propose to change anything due to this
observation.

> win32   no

I guess you could implement this in two ways:

* CreateFileMapping(), MapViewOfFile(), PrefetchVirtualMemory(),
UnmapViewOfFile(), Closehandle().  That's a lot system calls and maybe
 expensive VM stuff, who knows.

* ReadFileEx() in OVERLAPPED mode (async), into a dummy buffer, with a
completion callback that frees the buffer and OVERLAPPED object.
That's a lot of extra work allocating memory, copying to user space,
scheduling a callback, and freeing memory for nothing.

Both are terrible, but likely still better than an I/O stall, I dunno.
I think the VMS, NT, Unix-hater view would be: why would you want such
a stupid programming interface anyway, when you could use async I/O
properly?  I love Unix, but they'd have a point.

Re: race condition in pg_class

2024-08-16 Thread Noah Misch

Thanks for reviewing.

On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:
> On 14/07/2024 20:48, Noah Misch wrote:
> > I've pushed the two patches for your reports.  To placate cfbot, I'm 
> > attaching
> > the remaining patches.
> 
> inplace090-LOCKTAG_TUPLE-eoxact-v8.patch: Makes sense. A comment would be in
> order, it looks pretty random as it is. Something like:
> 
> /*
>  * Tuple locks are currently only held for short durations within a
>  * transaction. Check that we didn't forget to release one.
>  */

Will add.

> inplace110-successors-v8.patch: Makes sense.
> 
> The README changes would be better as part of the third patch, as this patch
> doesn't actually do any of the new locking described in the README, and it
> fixes the "inplace update updates wrong tuple" bug even without those tuple
> locks.

That should work.  Will confirm.

> > + * ... [any slow preparation not requiring oldtup] ...
> > + * heap_inplace_update_scan([...], &tup, &inplace_state);
> > + * if (!HeapTupleIsValid(tup))
> > + * elog(ERROR, [...]);
> > + * ... [buffer is exclusive-locked; mutate "tup"] ...
> > + * if (dirty)
> > + * heap_inplace_update_finish(inplace_state, tup);
> > + * else
> > + * heap_inplace_update_cancel(inplace_state);
> 
> I wonder if the functions should be called "systable_*" and placed in
> genam.c rather than in heapam.c. The interface looks more like the existing
> systable functions. It feels like a modularity violation for a function in
> heapam.c to take an argument like "indexId", and call back into systable_*
> functions.

Yes, _scan() and _cancel() especially are wrappers around systable.  Some API
options follow.  Any preference or other ideas?

 direct s/heap_/systable_/ rename

 systable_inplace_update_scan([...], &tup, &inplace_state);
 if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
 ... [buffer is exclusive-locked; mutate "tup"] ...
 if (dirty)
systable_inplace_update_finish(inplace_state, tup);
 else
systable_inplace_update_cancel(inplace_state);

 make the first and last steps more systable-like

 systable_inplace_update_begin([...], &tup, &inplace_state);
 if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
 ... [buffer is exclusive-locked; mutate "tup"] ...
 if (dirty)
systable_inplace_update(inplace_state, tup);
 systable_inplace_update_end(inplace_state);

 no systable_ wrapper for middle step, more like CatalogTupleUpdate

 systable_inplace_update_begin([...], &tup, &inplace_state);
 if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
 ... [buffer is exclusive-locked; mutate "tup"] ...
 if (dirty)
heap_inplace_update(relation,

systable_inplace_old_tuple(inplace_state),
tup,

systable_inplace_buffer(inplace_state));
 systable_inplace_update_end(inplace_state);

> > /*--
> >  * XXX A crash here can allow datfrozenxid() to get ahead of 
> > relfrozenxid:
> >  *
> >  * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
> >  * ["R" is a VACUUM tbl]
> >  * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
> > * c * D: systable_getnext() returns pg_class tuple of tbl
> >  * R: memcpy() into pg_class tuple of tbl
> >  * D: raise pg_database.datfrozenxid, XLogInsert(), finish
> >  * [crash]
> >  * [recovery restores datfrozenxid w/o relfrozenxid]
> >  */
> 
> Hmm, that's a tight race, but feels bad to leave it unfixed. One approach
> would be to modify the tuple on the buffer only after WAL-logging it. That
> way, D cannot read the updated value before it has been WAL logged. Just
> need to make sure that the change still gets included in the WAL record.
> Maybe something like:
> 
> if (RelationNeedsWAL(relation))
> {
> /*
>  * Make a temporary copy of the page that includes the change, in
>  * case the a full-page image is logged
>  */
> PGAlignedBlock tmppage;
> 
> memcpy(tmppage.data, page, BLCKSZ);
> 
> /* copy the tuple to the temporary copy */
> memcpy(...);
> 
> XLogRegisterBlock(0, ..., tmppage, REGBUF_STANDARD);
> XLogInsert();
> }
> 
> /* copy the tuple to the buffer */
> memcpy(...);

Yes, that's the essence of
inplace180-datfrozenxid-overtakes-relfrozenxid-v1.patch from
https://postgr.es/m/flat/20240620012908.92.nmi...@google.com.

> >   pg_class heap_inplace_update_scan() callers: before the call, acquire
> >   LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), 
> > ShareUpdateExclusiveLock
> >   (VACUUM), or a mode with strictly more conflicts.  If the update targets a
> >   row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must 
> > be
> >   on the table.  Locking the index rel is optional.  (This allows VACUUM to
> >   overwrite per-index pg_class while holding a lock on the table alone.)  We
> >   could allow weaker locks, in which case the ne

Re: Speed up Hash Join by teaching ExprState about hashing

2024-08-16 Thread David Rowley

On Thu, 15 Aug 2024 at 19:50, Alexey Dvoichenkov  wrote:
> I gave v3 another look. One tiny thing I've noticed is that you
> removed ExecHashGetHashValue() but not its forward declaration in
> include/executor/nodeHash.h

Fixed

> I also reviewed the JIT code this time, it looks reasonable to
> me. I've added names to some variables to make the IR easier to
> read. (Probably best to squash it into your patch, if you want to
> apply this.)

Thanks. I've included that.

I made another complete pass over this today and I noticed that there
were a few cases where I wasn't properly setting resnull and resvalue
to (Datum) 0.

I'm happy with the patch now. I am aware nothing currently uses
EEOP_HASHDATUM_SET_INITVAL, but I want to get moving with the Hash
Aggregate usages of this code fairly quickly and I'd rather get the
ExprState step code done now and not have to change it again.

v4 patch attached. If nobody else wants to look at this then I'm
planning on pushing it soon.

David

v4-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patch
Description: Binary data

66 matches

Mail list logo