Re: Simplify documentation related to Windows builds

2024-01-30 Thread Michael Paquier
On Fri, Jan 19, 2024 at 06:11:40AM -0500, Andrew Dunstan wrote:
> FYI Strawberry was a bit stuck for a while at 5.32, but they are now up to
> 5.38. See 
>
> I agree we shouldn't be recommending any particular perl distro, especially
> not ASPerl which now has annoying license issues.

The more I think about this thread, the more I'd tend to wipe out most
of "windows-requirements" for the sole reason that it is the far-west
regarding the various ways it is possible to get the dependencies we
need for the build and at runtime.  We could keep it minimal with the
set of requirements we are listing under meson in terms of versions:
https://www.postgresql.org/docs/devel/install-requirements.html

Then we could have one sentence recommending one, at most two
facilities used the buildfarm, like https://www.msys2.org/ or
chocolatey as these group basically all the dependencies we need for a
meson build (right?) while linking back to the meson page about the
version requirements.

One issue I have with the meson page listing the requirements is that
we don't directly mention Diff, but that's critical for the tests.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Pavel Stehule
Hi

I have one question, what is a block of implementation of some variant of
VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
CONCURRENTLY cannot be used for VACUUM FULL?

Regards

Pavel


Re: Improve eviction algorithm in ReorderBuffer

2024-01-30 Thread Masahiko Sawada
On Fri, Jan 26, 2024 at 5:36 PM Masahiko Sawada  wrote:
>
> On Wed, Dec 20, 2023 at 12:11 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 20, 2023 at 6:49 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Dec 19, 2023 at 8:02 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Dec 19, 2023 at 8:31 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Sun, Dec 17, 2023 at 11:40 AM Amit Kapila 
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > The individual transactions shouldn't cross
> > > > > > 'logical_decoding_work_mem'. I got a bit confused by your proposal 
> > > > > > to
> > > > > > maintain the lists: "...splitting it into two lists: transactions
> > > > > > consuming 5% < and 5% >=  of the memory limit, and checking the 5% 
> > > > > > >=
> > > > > > list preferably.". In the previous sentence, what did you mean by
> > > > > > transactions consuming 5% >= of the memory limit? I got the 
> > > > > > impression
> > > > > > that you are saying to maintain them in a separate transaction list
> > > > > > which doesn't seems to be the case.
> > > > >
> > > > > I wanted to mean that there are three lists in total: the first one
> > > > > maintain the transactions consuming more than 10% of
> > > > > logical_decoding_work_mem,
> > > > >
> > > >
> > > > How can we have multiple transactions in the list consuming more than
> > > > 10% of logical_decoding_work_mem? Shouldn't we perform serialization
> > > > before any xact reaches logical_decoding_work_mem?
> > >
> > > Well, suppose logical_decoding_work_mem is set to 64MB, transactions
> > > consuming more than 6.4MB are added to the list. So for example, it's
> > > possible that the list has three transactions each of which are
> > > consuming 10MB while the total memory usage in the reorderbuffer is
> > > still 30MB (less than logical_decoding_work_mem).
> > >
> >
> > Thanks for the clarification. I misunderstood the list to have
> > transactions greater than 70.4 MB (64 + 6.4) in your example. But one
> > thing to note is that maintaining these lists by default can also have
> > some overhead unless the list of open transactions crosses a certain
> > threshold.
> >
>
> On further analysis, I realized that the approach discussed here might
> not be the way to go. The idea of dividing transactions into several
> subgroups is to divide a large number of entries into multiple
> sub-groups so we can reduce the complexity to search for the
> particular entry. Since we assume that there are no big differences in
> entries' sizes within a sub-group, we can pick the entry to evict in
> O(1). However, what we really need to avoid here is that we end up
> increasing the number of times to evict entries because serializing an
> entry to the disk is more costly than searching an entry on memory in
> general.
>
> I think that it's no problem in a large-entries subgroup but when it
> comes to the smallest-entries subgroup, like for entries consuming
> less than 5% of the limit, it could end up evicting many entries. For
> example, there would be a huge difference between serializing 1 entry
> consuming 5% of the memory limit and serializing 5000 entries
> consuming 0.001% of the memory limit. Even if we can select 5000
> entries quickly, I think the latter would be slower in total. The more
> subgroups we create, the more the algorithm gets complex and the
> overheads could cause. So I think we need to search for the largest
> entry in order to minimize the number of evictions anyway.
>
> Looking for data structures and algorithms, I think binaryheap with
> some improvements could be promising. I mentioned before why we cannot
> use the current binaryheap[1]. The missing pieces are efficient ways
> to remove the arbitrary entry and to update the arbitrary entry's key.
> The current binaryheap provides binaryheap_remove_node(), which is
> O(log n), but it requires the entry's position in the binaryheap. We
> can know the entry's position just after binaryheap_add_unordered()
> but it might be changed after heapify. Searching the node's position
> is O(n). So the improvement idea is to add a hash table to the
> binaryheap so that it can track the positions for each entry so that
> we can remove the arbitrary entry in O(log n) and also update the
> arbitrary entry's key in O(log n). This is known as the indexed
> priority queue. I've attached the patch for that (0001 and 0002).
>
> That way, in terms of reorderbuffer, we can update and remove the
> transaction's memory usage in O(log n) (in worst case and O(1) in
> average) and then pick the largest transaction in O(1). Since we might
> need to call ReorderBufferSerializeTXN() even in non-streaming case,
> we need to maintain the binaryheap anyway.

Since if the number of transactions being decoded is small, updating
max-heap for each memory counter update could lead to some
regressions, I've measured it with the case where updating memory
counter happens frequently:

setup script:
create table test (c int);
selec

Re: why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Michael Paquier
On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
> I have one question, what is a block of implementation of some variant of
> VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
> CONCURRENTLY cannot be used for VACUUM FULL?

You may be interested in these threads:
https://www.postgresql.org/message-id/CAB7nPqTGmNUFi%2BW6F1iwmf7J-o6sY%2Bxxo6Yb%3DmkUVYT-CG-B5A%40mail.gmail.com
https://www.postgresql.org/message-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg%40mail.gmail.com

VACUUM FULL is CLUSTER under the hoods.  One may question whether it
is still a relevant discussion these days if we assume that autovacuum
is able to keep up, because it always keeps up with the house cleanup,
right?  ;)

More seriously, we have a lot more options these days with VACUUM like 
PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
available may have changed things.  So, would it be worth the
complexities around heap manipulations that lower locks would require?
--
Michael


signature.asc
Description: PGP signature


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 16:20:54 +0900,
  Michael Paquier  wrote:

>>> +if (!format_specified)
>>> +/* Set the default format. */
>>> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>>> +
>>> 
>>> I think we can pass "text" in this case instead of NULL. That way,
>>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
>> 
>> Yes, we can do it. But it needs a DefElem allocation. Is it
>> acceptable?
> 
> I don't think that there is a need for a DelElem at all here?

We use defel->location for an error message. (We don't need
to set location for the default "text" DefElem.)

>While I
> am OK with the choice of calling CopyToInit() in the
> ProcessCopyOption*() routines that exist to keep the set of callbacks
> local to copyto.c and copyfrom.c, I think that this should not bother
> about setting opts_out->csv_mode or opts_out->csv_mode but just set 
> the opts_out->{to,from}_routine callbacks.

OK. I'll keep opts_out->{csv_mode,binary} in copy.c.

>  Now, all the Init() callbacks are empty for the
> in-core callbacks, so I think that we should just remove it entirely
> for now.  Let's keep the core patch a maximum simple.  It is always
> possible to build on top of it depending on what people need.  It's
> been mentioned that JSON would want that, but this also proves that we
> just don't care about that for all the in-core callbacks, as well.  I
> would choose a minimalistic design for now.

OK. Let's remove Init() callbacks from the first patch set.

> I would be really tempted to put my hands on this patch to put into
> shape a minimal set of changes because I'm caring quite a lot about
> the performance gains reported with the removal of the "if" checks in
> the per-row callbacks, and that's one goal of this thread quite
> independent on the extensibility.  Sutou-san, would you be OK with
> that?

Yes, sure.
(We want to focus on the performance gains in the first
patch set and then focus on extensibility again, right?)

For the purpose, I think that the v7 patch set is more
suitable than the v9 patch set. The v7 patch set doesn't
include Init() callbacks, custom options validation support
or extra Copy{In,Out}Response support. But the v7 patch set
misses the removal of the "if" checks in
NextCopyFromRawFields() that exists in the v9 patch set. I'm
not sure how much performance will improve by this but it
may be worth a try.

Can I prepare the v10 patch set as "the v7 patch set" + "the
removal of the "if" checks in NextCopyFromRawFields()"?
(+ reverting opts_out->{csv_mode,binary} changes in
ProcessCopyOptions().)


Thanks,
-- 
kou





Re: More new SQL/JSON item methods

2024-01-30 Thread Jeevan Chalke
On Mon, Jan 29, 2024 at 11:03 AM Tom Lane  wrote:

> Kyotaro Horiguchi  writes:
> > Having said that, I'm a bit concerned about the case where an overly
> > long string is given there. However, considering that we already have
> > "invalid input syntax for type xxx: %x" messages (including for the
> > numeric), this concern might be unnecessary.
>
> Yeah, we have not worried about that in the past.
>
> > Another concern is that the input value is already a numeric, not a
> > string. This situation occurs when the input is NaN of +-Inf. Although
> > numeric_out could be used, it might cause another error. Therefore,
> > simply writing out as "argument NaN and Infinity.." would be better.
>
> Oh!  I'd assumed that we were discussing a string that we'd failed to
> convert to numeric.  If the input is already numeric, then either
> the error is unreachable or what we're really doing is rejecting
> special values such as NaN on policy grounds.  I would ask first
> if that policy is sane at all.  (I'd lean to "not" --- if we allow
> it in the input JSON, why not in the output?)  If it is sane, the
> error message needs to be far more specific.
>
> regards, tom lane
>

*Consistent error message related to type:*

Agree that the number is not a PostgreSQL type and needs a change. As Tom
suggested, we can say "type numeric" here. However, I have seen two
variants of error messages here: (1) When the input is numeric and (2) when
the input is string. For first, we have error messages like:
numeric argument of jsonpath item method .%s() is out of range for type
double precision

and for the second, it is like:
string argument of jsonpath item method .%s() is not a valid representation
of a double precision number

The second form says "double precision number". So, in the decimal/number
case, should we use "numeric number" and then similarly "big integer
number"?

What if we use phrases like "for type double precision" at both places,
like:
numeric argument of jsonpath item method .%s() is out of range for type
double precision
string argument of jsonpath item method .%s() is not a valid representation
for type double precision

With this, the rest will be like:
for type numeric
for type bigint
for type integer

Suggestions?

---

*Showing input string in the error message:*

argument "...input string here..." of jsonpath item method .%s() is out of
range for type numeric

If we add the input string in the error, then for some errors, it will be
too big, for example:

-ERROR:  numeric argument of jsonpath item method .double() is out of range
for type double precision
+ERROR:  argument
"1"
of jsonpath item method .double() is out of range for type double precision

Also, for non-string input, we need to convert numeric to string just for
the error message, which seems overkill.

On another note, irrespective of these changes, is it good to show the
given input in the error messages? Error messages are logged and may leak
some details.

I think the existing way seems ok.

---

*NaN and Infinity restrictions:*

I am not sure why NaN and Infinity are not allowed in conversion to double
precision (.double() method). I have used the same restriction for
.decimal() and .number(). However, as you said, we should have error
messages more specific. I tried that in the attached patch; please have
your views. I have the following wordings for that error message:
"NaN or Infinity is not allowed for jsonpath item method .%s()"

Suggestions...


Thanks
-- 
Jeevan Chalke

*Principal, ManagerProduct Development*



edbpostgres.com


improve_error_for_Nan_Infinity.patch.no
Description: Binary data


Re: meson + libpq_pipeline

2024-01-30 Thread Peter Eisentraut

On 29.01.24 18:37, Alvaro Herrera wrote:

I just realized while looking at Jelte's patch for the new nonblocking
query cancel stuff that the Meson build doesn't run the libpq_pipeline
tests :-(

Is there any way to wire the tests to make it work?


I think it is running already.

For example, here is a cfbot run (that fails for other reasons, but it 
preserves the test artifacts): https://cirrus-ci.com/task/6143446205857792


Here is the libpq_pipeline log: 
https://api.cirrus-ci.com/v1/artifact/task/6143446205857792/testrun/build/testrun/libpq_pipeline/001_libpq_pipeline/log/regress_log_001_libpq_pipeline





Re: why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 9:14 odesílatel Michael Paquier 
napsal:

> On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
> > I have one question, what is a block of implementation of some variant of
> > VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
> > CONCURRENTLY cannot be used for VACUUM FULL?
>
> You may be interested in these threads:
>
> https://www.postgresql.org/message-id/CAB7nPqTGmNUFi%2BW6F1iwmf7J-o6sY%2Bxxo6Yb%3DmkUVYT-CG-B5A%40mail.gmail.com
>
> https://www.postgresql.org/message-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg%40mail.gmail.com
>
> VACUUM FULL is CLUSTER under the hoods.  One may question whether it
> is still a relevant discussion these days if we assume that autovacuum
> is able to keep up, because it always keeps up with the house cleanup,
> right?  ;)
>
> More seriously, we have a lot more options these days with VACUUM like
> PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
> available may have changed things.  So, would it be worth the
> complexities around heap manipulations that lower locks would require?
>

One of my customer today is reducing one table from 140GB to 20GB.  Now he
is able to run archiving. He should play with pg_repack, and it is working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This is not
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn't try it

I'll read the threads

Pavel


--
> Michael
>


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Michael Paquier
On Tue, Jan 30, 2024 at 05:15:11PM +0900, Sutou Kouhei wrote:
> We use defel->location for an error message. (We don't need
> to set location for the default "text" DefElem.)

Yeah, but you should not need to have this error in the paths that set
the callback routines in opts_out if the same validation happens a few
lines before, in copy.c.

> Yes, sure.
> (We want to focus on the performance gains in the first
> patch set and then focus on extensibility again, right?)

Yep, exactly, the numbers are too good to just ignore.  I don't want
to hijack the thread, but I am really worried about the complexities
this thread is getting into because we are trying to shape the
callbacks in the most generic way possible based on *two* use cases.
This is going to be a never-ending discussion.  I'd rather get some
simple basics, and then we can discuss if tweaking the callbacks is
really necessary or not.  Even after introducing the pg_proc lookups
to get custom callbacks.

> For the purpose, I think that the v7 patch set is more
> suitable than the v9 patch set. The v7 patch set doesn't
> include Init() callbacks, custom options validation support
> or extra Copy{In,Out}Response support. But the v7 patch set
> misses the removal of the "if" checks in
> NextCopyFromRawFields() that exists in the v9 patch set. I'm
> not sure how much performance will improve by this but it
> may be worth a try.

Yeah..  The custom options don't seem like an absolute strong
requirement for the first shot with the callbacks or even the
possibility to retrieve the callbacks from a function call.  I mean,
you could provide some control with SET commands and a few GUCs, at
least, even if that would be strange.  Manipulations with a list of
DefElems is the intuitive way to have custom options at query level,
but we also have to guess the set of callbacks from this list of
DefElems coming from the query.  You see my point, I am not sure 
if it would be the best thing to process twice the options, especially
when it comes to decide if a DefElem should be valid or not depending
on the callbacks used.  Or we could use a kind of "special" DefElem
where we could store a set of key:value fed to a callback :)

> Can I prepare the v10 patch set as "the v7 patch set" + "the
> removal of the "if" checks in NextCopyFromRawFields()"?
> (+ reverting opts_out->{csv_mode,binary} changes in
> ProcessCopyOptions().)

Yep, if I got it that would make sense to me.  If you can do that,
that would help quite a bit.  :)
--
Michael


signature.asc
Description: PGP signature


RE: speed up a logical replica setup

2024-01-30 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

Thanks for updating the patch!

> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable 
> to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned 
off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. 
Thought?

Below part contains my comments for v11-0001. Note that the ordering is random.

01. doc
```

 -D 
 --pgdata

```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

02. doc
```
  
   pg_createsubscriber takes the publisher and 
subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  

```

I found that you did not include my suggestion without saying [1]. Do you 
dislike
the comment or still considering?

03. doc
```
  -P  connstr
```

Too many blank after -P.

04. doc
```

 pg_createsubscriber checks if the target data
 directory is used by a physical replica. Stop the physical replica if it is
 running. One of the next steps is to add some recovery parameters that
 requires a server start. This step avoids an error.

```

I think doc is not updated. Currently (not sure it is good) the physical replica
must be running before doing pg_createsubscriber.

05. doc
```
 each specified database on the source server. The replication slot name
 contains a pg_createsubscriber prefix. These replication
 slots will be used by the subscriptions in a future step.  A temporary
```

According to the "Replication Slot Management" subsection in 
logical-replication.sgml,
the format of names should be described expressly, e.g.:

```
These replication slots have generated names:"pg_createsubscriber_%u_%d" 
(parameters: Subscription oid, Process id pid).
```

Publications, a temporary slot, and subscriptions should be described as well.

06. doc
```
 Next, pg_createsubscriber creates one 
publication
 for each specified database on the source server. Each publication
 replicates changes for all tables in the database. The publication name

```

Missing update. Publications are created before creating replication slots.

07. general
I think there are some commenting conversions in PG, but this file breaks it.

* Comments must be one-line as much as possible
* Periods must not be added when the comment is one-line.
* Initial character must be capital.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().


09. LogicalRepInfo
```
char   *subconninfo;/* subscription connection string for 
logical
 * replication 
*/
```

As I posted in comment#8[2], I don't think it is "subscription connection". 
Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
/*
 * If --database option is not provided, try to obtain the 
dbname from
 * the publisher conninfo. If dbname parameter is not 
available, error
 * out.
 */

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

11. check_data_directory
```
/*
 * Is it a cluster directory? These are preliminary checks. It is far from
 * making an accurate check. If it is not a clone from the publisher, it will
 * eventually fail in a future step.
 */
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a 
directory.
GetDataDirectoryCreatePerm() verifies it.

12. main
```
/* Register a function to clean up objects in case of failure. */
atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
d

Re: UUID v7

2024-01-30 Thread Andrey M. Borodin


> On 30 Jan 2024, at 12:28, Sergey Prokhorenko  
> wrote:
> 
> 
> I think this phrase is outdated: "This function can optionally accept a 
> timestamp used instead of current time.
> This allows implementation of k-way sotable identifiers.”
Fixed.

> This phrase is wrong: "Both functions return a version 4 (random) UUID.”
This applies to functions gen_random_uuid() and uuidv4().
> 
> For this phrase the reason is unclear and the phrase is most likely incorrect:
> if large batches of UUIDs are generated at the
> +   same time it's possible that some UUIDs will store a time that is 
> slightly later
> +   than their actual generation time

I’ve rewritten this phrase, hope it’s more clear now.


Best regards, Andrey Borodin.


v16-0001-Implement-UUID-v7.patch
Description: Binary data


Re: Change GUC hashtable to use simplehash?

2024-01-30 Thread John Naylor
On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma  wrote:
> But given that we know the data length and we have it in a register
> already, it's easy enough to just mask out data past the end with a
> shift. See patch 1. Performance benefit is about 1.5x Measured on a
> small test harness that just hashes and finalizes an array of strings,
> with a data dependency between consecutive hashes (next address
> depends on the previous hash output).

Interesting work! I've taken this idea and (I'm guessing, haven't
tested) improved it by re-using an intermediate step for the
conditional, simplifying the creation of the mask, and moving the
bitscan out of the longest dependency chain. Since you didn't attach
the test harness, would you like to run this and see how it fares?
(v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan
to test myself as well, but since your test tries to model true
latency, I'm more interested in that one.

> Not sure if the second one is worth the extra code.

I'd say it's not worth optimizing the case we think won't be taken
anyway. I also like having a simple path to assert against.
From be62303df785b80b1bf888b869c5b240a6777af0 Mon Sep 17 00:00:00 2001
From: Ants Aasma 
Date: Mon, 29 Jan 2024 15:16:02 +0200
Subject: [PATCH v16 1/2] Speed up last iteration of aligned fasthash

We know the length of the string so we can mask out end of the
string with a shift. Without this the aligned version was slower
than unaligned on small strings.
---
 src/include/common/hashfn_unstable.h | 31 
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h
index 3d927e1fb1..8ee1b99a20 100644
--- a/src/include/common/hashfn_unstable.h
+++ b/src/include/common/hashfn_unstable.h
@@ -176,6 +176,19 @@ fasthash_accum(fasthash_state *hs, const char *k, int len)
 #define haszero64(v) \
 	(((v) - 0x0101010101010101) & ~(v) & 0x8080808080808080)
 
+/*
+ * Returns non-zero when first byte in memory order is not NUL
+ */
+static inline int
+first_byte_nonzero(uint64 v)
+{
+#ifdef WORDS_BIGENDIAN
+		return v >> 56;
+#else
+		return v & 0xFF;
+#endif
+}
+
 /*
  * all-purpose workhorse for fasthash_accum_cstring
  */
@@ -211,11 +224,12 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	const char *const start = str;
 	int			remainder;
 	uint64		zero_bytes_le;
+	uint64		chunk;
 
 	Assert(PointerIsAligned(start, uint64));
 	for (;;)
 	{
-		uint64		chunk = *(uint64 *) str;
+		chunk = *(uint64 *) str;
 
 		/*
 		 * With little-endian representation, we can use this calculation,
@@ -243,9 +257,18 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	 * byte within the input word by counting the number of trailing (because
 	 * little-endian) zeros and dividing the result by 8.
 	 */
-	remainder = pg_rightmost_one_pos64(zero_bytes_le) / BITS_PER_BYTE;
-	fasthash_accum(hs, str, remainder);
-	str += remainder;
+	if (first_byte_nonzero(chunk))
+	{
+		remainder = pg_rightmost_one_pos64(zero_bytes_le) / BITS_PER_BYTE;
+#ifdef WORDS_BIGENDIAN
+		hs->accum = chunk & ((~0ULL) << (64 - BITS_PER_BYTE*remainder));
+#else
+		hs->accum = chunk & ((~0ULL) >> (64 - BITS_PER_BYTE*remainder));
+#endif
+		fasthash_combine(hs);
+
+		str += remainder;
+	}
 
 	return str - start;
 }
-- 
2.43.0

From a1e1648f3f3a25001c62fffe7dcd422273619e3e Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 30 Jan 2024 16:14:57 +0700
Subject: [PATCH v16 2/2] Shorten dependency chain for computing hash mask

---
 src/include/common/hashfn_unstable.h | 30 
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h
index 8ee1b99a20..0cac3aa380 100644
--- a/src/include/common/hashfn_unstable.h
+++ b/src/include/common/hashfn_unstable.h
@@ -176,19 +176,6 @@ fasthash_accum(fasthash_state *hs, const char *k, int len)
 #define haszero64(v) \
 	(((v) - 0x0101010101010101) & ~(v) & 0x8080808080808080)
 
-/*
- * Returns non-zero when first byte in memory order is not NUL
- */
-static inline int
-first_byte_nonzero(uint64 v)
-{
-#ifdef WORDS_BIGENDIAN
-		return v >> 56;
-#else
-		return v & 0xFF;
-#endif
-}
-
 /*
  * all-purpose workhorse for fasthash_accum_cstring
  */
@@ -225,6 +212,7 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	int			remainder;
 	uint64		zero_bytes_le;
 	uint64		chunk;
+	uint64		mask;
 
 	Assert(PointerIsAligned(start, uint64));
 	for (;;)
@@ -257,14 +245,22 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	 * byte within the input word by counting the number of trailing (because
 	 * little-endian) zeros and dividing the result by 8.
 	 */
-	if (first_byte_nonzero(chunk))
+	/*
+	 * Create a mask for the remaining bytes and
+	 * combine them into the hash. It would be harmless if the mask also covered the NUL
+	 * terminator, except for t

Re: Documentation to upgrade logical replication cluster

2024-01-30 Thread vignesh C
On Mon, 29 Jan 2024 at 16:01, Bharath Rupireddy
 wrote:
>
> On Mon, Jan 29, 2024 at 10:10 AM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v5 version patch has the changes
> > for the same.
>
> Thanks for working on this. Here are some comments on the v5 patch:
>
> 1.
> +  
> +   Migration of logical replication clusters is possible only when all the
> +   members of the old logical replication clusters are version 17.0 or later.
>
> Perhaps define what logical replication cluster is either in glossary
> or within a parenthesis next to the first use in the docs? This will
> help developers understand it better and will not confuse them with
> postgres cluster. I see it being used for the first time in code
> comments 9a17be1e2, but this patch uses it for the first time in the
> docs.

I have added it in glossary.

> 2.
> +   Before reading this section, refer  page for
> +   more details about pg_upgrade.
> +  
>
> This looks extraneous, we can just link to pg_upgrade on the first use
> of pg_upgrade, change the following
>
> +   
> +pg_upgrade attempts to migrate logical
> +slots. This helps avoid the need for manually defining the same
>
> to
>
> +   
> + attempts to migrate logical
> +slots. This helps avoid the need for manually defining the same

Modified

> 3.
> + transactional, the user is advised to take backups. Backups can be taken
> + as described in .
> +
>
> How about simplifying the above to "the user is advised to take
> backups as described in " instead
> of two statements?

Modified

> 4.
>   subscription is temporarily disabled, by executing
> +ALTER SUBSCRIPTION
> ... DISABLE.
> +Re-enable the subscription after the upgrade.
> +   
>
> Is it to avoid repeated failures of logical replication apply workers
> on the subscribers? Isn't it good to say why subscription needs to be
> disabled?

Added it

> 5.
> +   
> +There are some prerequisites for pg_upgrade to
> +be able to upgrade the logical slots. If these are not met an error
> +will be reported.
> +   
>
> I think it's better to be "Following are prerequisites for
> pg_upgrade to.."?

Modified

> 6.
> +
> + 
> +  The old cluster has replicated all the transactions and logical 
> decoding
> +  messages to subscribers.
> + 
>
> I think it's better to be "The old cluster must have replicated all
> the transactions and "?

Modified

> 7.
> + 
> +  The new cluster must not have permanent logical slots, i.e.,
> +  there must be no slots where
> +   linkend="view-pg-replication-slots">pg_replication_slots.temporary
> +  is false.
>
> I think we better specify a full SQL query as opposed to just
> specifying one output column and the view name.
>
> 
>  The new cluster must not have permanent logical slots, i.e., a query like:
> 
> SELECT count(*) FROM pg_replication_slots WHERE slot_type = 'logical'
> AND temporary IS false;
> 
> must return 0.
>

Modified

> 8.
> +   If the old cluster is prior to 17.0, then no slots on the primary are
> +   copied to the new standby, so all the slots must be recreated 
> manually.
> +   If the old cluster is 17.0 or later, then only logical slots on the
>
> I think it's better to say "version 17.0" instead of just "17.0".

Modified

> 9.
> +   primary are copied to the new standby, but other slots on the
> old standby
>
> "but other slots on the old standby" - is it slots on the old standby
> or old cluster?
>
> I think it's the other way around: the old cluster needs to be
> replaced with the old standby in the newly added paragraph.

Modified it to old primary as we upgrade primary and do a rsync

> 10.
> Change
> +   primary are copied to the new standby, but other slots on the
> old standby
> +   are not copied so must be recreated manually.
>
> to
>
> +   primary are copied to the new standby, but other slots on the
> old standby
> +   are not copied, so must be recreated manually.

Modified

> 11.
> +   
> +
> + The logical replication restrictions apply to logical replication 
> cluster
> + upgrades also. See  
> for
> + the details of logical replication restrictions.
> +
>
> How about just say "See  linkend="logical-replication-restrictions"/> for details." instead of
> using logical replication restrictions more than once in the same
> para?

Modified

> 12.
> +
> + The prerequisites of publisher upgrade apply to logical replication
> + cluster upgrades also. See 
> + for the details of publisher upgrade prerequisites.
>
> How about just say "See 
> for details." instead of using publisher upgrade prerequisites more
> than once in the same para?

Modified

> 13.
> +
> + The prerequisites of subscriber upgrade apply to logical replication
> + cluster upgrades also. See 
> + for the details of subscriber upgrade prerequisites.
> +
>
> How about just say "See 
> for details." instead of using subsc

Re: Report planning memory in EXPLAIN ANALYZE

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-30, Ashutosh Bapat wrote:

> On Mon, Jan 29, 2024 at 10:43 PM Alvaro Herrera  
> wrote:

> > I also added a trivial test for EXPLAIN EXECUTE, which was uncovered,
> > and some other minor stylistic changes.
> 
> Thanks. Looks fine to me.

Thanks for looking!

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
¡Ay, ay, ay!  Con lo mucho que yo lo quería (bis)
se fue de mi vera ... se fue para siempre, pa toíta ... pa toíta la vida
¡Ay Camarón! ¡Ay Camarón!(Paco de Lucía)




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-30 Thread John Naylor
On Tue, Jan 30, 2024 at 7:56 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 29, 2024 at 8:48 PM John Naylor  wrote:

> > I meant the macro could probably be
> >
> > Max(SLAB_DEFAULT_BLOCK_SIZE, (size) * N)
> >
> > (Right now N=32). I also realize I didn't answer your question earlier
> > about block sizes being powers of two. I was talking about PG in
> > general -- I was thinking all block sizes were powers of two. If
> > that's true, I'm not sure if it's because programmers find the macro
> > calculations easy to reason about, or if there was an implementation
> > reason for it (e.g. libc behavior). 32*2088 bytes is about 65kB, or
> > just above a power of two, so if we did  round that up it would be
> > 128kB.
>
> Thank you for your explanation. It might be better to follow other
> codes. Does the calculation below make sense to you?
>
> RT_SIZE_CLASS_ELEM size_class = RT_SIZE_CLASS_INFO[i];
> Size inner_blocksize = SLAB_DEFAULT_BLOCK_SIZE;
> while (inner_blocksize < 32 * size_class.allocsize)
>  inner_blocksize <<= 1;

It does make sense, but we can do it more simply:

Max(SLAB_DEFAULT_BLOCK_SIZE, pg_nextpower2_32(size * 32))




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-30, Pavel Stehule wrote:

> One of my customer today is reducing one table from 140GB to 20GB.  Now he
> is able to run archiving. He should play with pg_repack, and it is working
> well today, but I ask myself, what pg_repack does not be hard to do
> internally because it should be done for REINDEX CONCURRENTLY. This is not
> a common task, and not will be, but on the other hand, it can be nice to
> have feature, and maybe not too hard to implement today. But I didn't try it

FWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.

So thinking about your question, I think it might be possible to
integrate a tool that works like pg_squeeze, such that it runs when
VACUUM is invoked -- either under some new option, or just replace the
code under FULL, not sure.  If the Cybertec people allows it, we could
just grab the pg_squeeze code and add it to the things that VACUUM can
run.

Now, pg_squeeze has some additional features, such as periodic
"squeezing" of tables.  In a first attempt, for simplicity, I would
leave that stuff out and just allow it to run from the user invoking it,
and then have the command to do a single run.  (The scheduling features
could be added later, or somehow integrated into autovacuum, or maybe
something else.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We're here to devour each other alive"(Hobbes)




Re: UUID v7

2024-01-30 Thread Junwang Zhao
Hi Andrey,

On Tue, Jan 30, 2024 at 5:56 PM Andrey M. Borodin  wrote:
>
>
>
> > On 30 Jan 2024, at 12:28, Sergey Prokhorenko 
> >  wrote:
> >
> >
> > I think this phrase is outdated: "This function can optionally accept a 
> > timestamp used instead of current time.
> > This allows implementation of k-way sotable identifiers.”
> Fixed.
>
> > This phrase is wrong: "Both functions return a version 4 (random) UUID.”
> This applies to functions gen_random_uuid() and uuidv4().
> >
> > For this phrase the reason is unclear and the phrase is most likely 
> > incorrect:
> > if large batches of UUIDs are generated at the
> > +   same time it's possible that some UUIDs will store a time that is 
> > slightly later
> > +   than their actual generation time
>
> I’ve rewritten this phrase, hope it’s more clear now.
>
>
> Best regards, Andrey Borodin.

+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+ uint16_t result;
+ result = uuid->data[8] >> 6;
+
+ PG_RETURN_UINT16(result);
+}
\ No newline at end of file

It's always good to add a newline at the end of a  source file, though
this might be nitpicky.

-- 
Regards
Junwang Zhao




Re: Synchronizing slots from primary to standby

2024-01-30 Thread shveta malik
On Tue, Jan 30, 2024 at 11:31 AM Amit Kapila  wrote:
>
> In this regard, I feel we don't need to dump/restore the 'FAILOVER'
> option non-binary upgrade paths similar to the 'ENABLE' option. For
> binary upgrade, if the failover option is enabled, then we can enable
> it using Alter Subscription SET (failover=true). Let's add one test
> corresponding to this behavior in
> postgresql\src\bin\pg_upgrade\t\004_subscription.

Changed pg_dump behaviour as suggested and added additional test.

> Additionally, we need to update the pg_dump docs for the 'failover'
> option. See "When dumping logical replication subscriptions, .." [1].
> I think we also need to update the connect option docs in CREATE
> SUBSCRIPTION [2].

Updated docs.

> [1] - https://www.postgresql.org/docs/devel/app-pgdump.html
> [2] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

PFA v73-0001 which addresses the above comments. Other patches will be
rebased and posted after pushing this one. Thanks Hou-San for adding
pg_upgrade test for failover.

thanks
Shveta


v73-0001-Add-a-failover-option-to-subscriptions.patch
Description: Binary data


Re: [17] CREATE SUBSCRIPTION ... SERVER

2024-01-30 Thread Ashutosh Bapat
On Wed, Jan 24, 2024 at 7:15 AM Jeff Davis  wrote:
>
> On Tue, 2024-01-23 at 15:21 +0530, Ashutosh Bapat wrote:
> > I am with the prefix. The changes it causes make review difficult. If
> > you can separate those changes into a patch that will help.
>
> I ended up just removing the dummy FDW. Real users are likely to want
> to use postgres_fdw, and if not, it's easy enough to issue a CREATE
> FOREIGN DATA WRAPPER. Or I can bring it back if desired.
>
> Updated patch set (patches are renumbered):
>
>   * removed dummy FDW and test churn
>   * made a new pg_connection_validator function which leaves
> postgresql_fdw_validator in place. (I didn't document the new function
> -- should I?)
>   * included your tests improvements
>   * removed dependency from the subscription to the user mapping -- we
> don't depend on the user mapping for foreign tables, so we shouldn't
> depend on them here. Of course a change to a user mapping still
> invalidates the subscription worker and it will restart.
>   * general cleanup
>

Thanks.

> Overall it's simpler and hopefully easier to review. The patch to
> introduce the pg_create_connection role could use some more discussion,
> but I believe 0001 and 0002 are nearly ready.

0001 commit message says "in preparation of CREATE SUBSCRIPTION" but I
do not see the function being used anywhere except in testcases. Am I
missing something? Is this function necessary for this feature?

But more importantly this function and its minions are closely tied
with libpq and not an FDW. Converting a server and user mapping to
conninfo should be delegated to the FDW being used since that FDW
knows best how to use those options. Similarly options_to_conninfo()
should be delegated to the FDW. I imagine that the FDWs which want to
support subscriptions will need to implement hooks in
WalReceiverFunctionsType which seems to be designed to be pluggable.
--- quote
This API should be considered internal at the moment, but we could open it
up for 3rd party replacements of libpqwalreceiver in the future, allowing
pluggable methods for receiving WAL.
--- unquote
Not all of those hooks are applicable to every FDW since the publisher
may be different and may not provide all the functionality. So we
might need to rethink WalReceiverFunctionsType interface eventually.
But for now, we will need to change postgres_fdw to implement it.

We should mention something about the user mapping that will be used
to connect to SERVER when subscription specifies SERVER. I am not sure
where to mention this. May be we can get some clue from foreign server
documentation.

--
Best Wishes,
Ashutosh Bapat




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 11:31 odesílatel Alvaro Herrera 
napsal:

> On 2024-Jan-30, Pavel Stehule wrote:
>
> > One of my customer today is reducing one table from 140GB to 20GB.  Now
> he
> > is able to run archiving. He should play with pg_repack, and it is
> working
> > well today, but I ask myself, what pg_repack does not be hard to do
> > internally because it should be done for REINDEX CONCURRENTLY. This is
> not
> > a common task, and not will be, but on the other hand, it can be nice to
> > have feature, and maybe not too hard to implement today. But I didn't
> try it
>
> FWIW a newer, more modern and more trustworthy alternative to pg_repack
> is pg_squeeze, which I discovered almost by random chance, and soon
> discovered I liked it much more.
>
> So thinking about your question, I think it might be possible to
> integrate a tool that works like pg_squeeze, such that it runs when
> VACUUM is invoked -- either under some new option, or just replace the
> code under FULL, not sure.  If the Cybertec people allows it, we could
> just grab the pg_squeeze code and add it to the things that VACUUM can
> run.
>
> Now, pg_squeeze has some additional features, such as periodic
> "squeezing" of tables.  In a first attempt, for simplicity, I would
> leave that stuff out and just allow it to run from the user invoking it,
> and then have the command to do a single run.  (The scheduling features
> could be added later, or somehow integrated into autovacuum, or maybe
> something else.)
>

some basic variant (without autovacuum support) can be good enough. We have
no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
for it (sure, it can be limited by my perspective) . The necessity of
reducing table size is not too common (a lot of use cases are better
covered by using partitioning), but sometimes it is, and then buildin
simple available solution can be helpful.





> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
> "We're here to devour each other alive"(Hobbes)
>


Re: MERGE ... RETURNING

2024-01-30 Thread jian he
I didn't find any issue with v15.
no commit message in the patch, If a commit message is there, I can
help proofread.




Re: Returning non-terminated string in ECPG Informix-compatible function

2024-01-30 Thread Oleg Tselebrovskiy

Here's the code for bug reproduction:
#include 
#include 

EXEC SQL INCLUDE pgtypes_interval.h;
EXEC SQL INCLUDE ecpg_informix.h;

EXEC SQL BEGIN DECLARE SECTION;
char dirty_str[100] = "a__c_d_";
interval *interval_ptr;
EXEC SQL END DECLARE SECTION;

int main()
{
interval_ptr = (interval *) malloc(sizeof(interval));
interval_ptr->time = 1;
interval_ptr->month = 240;

printf("dirty_str contents before intoasc: %s\n", dirty_str);
intoasc(interval_ptr, dirty_str);
printf("dirty_str contents after intoasc: %s\n", dirty_str);
return 0;
}

And here's the output:

dirty_str contents before intoasc: 
a__c_d_
dirty_str contents after intoasc: @ 20 years 1 min 40 
secs_d_


I compiled it with following commands (provided for quicker 
reproduction):

/path/to/pgsql/bin/ecpg informix_bug_example.pgc
gcc -I/path/to/pgsql/include -c informix_bug_example.c
gcc -o informix_bug_example informix_bug_example.o -L/path/to/pgsql/lib 
-lecpg -lecpg_compat


I've also found at least one project that uses intoasc() in it - 
https://github.com/credativ/informix_fdw/


Oleg Tselebrovskiy, Postgres Pro




Re:Bug report and fix about building historic snapshot

2024-01-30 Thread cca5507
> This patch may be better, which only track catalog modified transactions.
Can anyone help review this patch?
Thanks.
--
Regards,
ChangAo Chen

Re: Some revises in adding sorting path

2024-01-30 Thread David Rowley
On Mon, 29 Jan 2024 at 22:39, Richard Guo  wrote:
> So in the v3 patch I've brought back the logic that considers
> incremental sort on partial paths in gather_grouping_paths().  However,
> I failed to compose a test case for this scenario without having to
> generate a huge table.  So in the v3 patch I did not include a test case
> for this aspect.

Can you share the test with the huge table?

I tried and failed as, if I'm not mistaken, you're talking about a
parallel aggregate query with an incremental sort between the Partial
Aggregate node and the Finalize Group Aggregate node.  If the partial
aggregate was a Group Aggregate, then it would already be correctly
sorted.  We don't need a more strict sort ordering to perform the
Finalize Group Aggregate, the results must already be sorted by at
least the GROUP BY clause.  If the partial aggregate had opted to Hash
Aggregate, then there'd be no presorted keys, so we could only get a
full sort.  I can't see any way to get an incremental sort between the
2 aggregate phases.

What am I missing?

I also tried reverting your changes to planner.c to see if your new
tests would fail. They all passed. So it looks like none of these
tests are testing anything new.

David




Re: Incorrect cost for MergeAppend

2024-01-30 Thread Alexander Kuzmenkov
Here is a small patch that reproduces the problem on two tables with
inheritance, and fixes it. I'll add it to the Commitfest.

On Tue, Jan 30, 2024 at 8:20 AM Ashutosh Bapat
 wrote:
>
> On Mon, Jan 29, 2024 at 6:11 PM Alexander Kuzmenkov
>  wrote:
> >
> > Hello hackers,
> >
> > While investigating some query plans, I noticed some code that seems
> > to be wrong: when create_merge_append_path() estimates the cost of
> > sorting an input, it calls cost_sort() passing subpath->parent->tuples
> > as the number of tuples. Shouldn't it use subpath->parent->rows or
> > even subpath->rows instead? The `tuples` variable doesn't account for
> > the filters on the relation, so this leads to incorrect cost estimates
> > when there are highly selective filters, and Sort + Append is chosen
> > instead of MergeAppend.
>
> All other callers of cost_sort() except plan_cluster_use_sort() are
> using rows instead of tuples. Even plan_cluster_use_sort() has
> rel->rows = rel->tuples, it's actually passing rows. So agree with
> your suggestion. However a test will be good since this code is quite
> old.
>
> --
> Best Wishes,
> Ashutosh Bapat
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8dbf790e89..2e1ec41a54 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1470,7 +1470,7 @@ create_merge_append_path(PlannerInfo *root,
 	  root,
 	  pathkeys,
 	  subpath->total_cost,
-	  subpath->parent->tuples,
+	  subpath->rows,
 	  subpath->pathtarget->width,
 	  0.0,
 	  work_mem,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 56a40d50f9..22e9ca0b10 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1716,6 +1716,28 @@ order by t1.b limit 10;
 reset enable_nestloop;
 drop table matest0 cascade;
 NOTICE:  drop cascades to table matest1
+-- Check that merge append is chosen in presence of filters.
+create table ma0(a int primary key);
+create table ma1() inherits (ma0);
+insert into ma0 select generate_series(1, 1);
+insert into ma1 select generate_series(1, 1);
+analyze ma0;
+analyze ma1;
+explain (costs off) select * from ma0 where a < 1000 order by a;
+QUERY PLAN 
+---
+ Merge Append
+   Sort Key: ma0.a
+   ->  Index Only Scan using ma0_pkey on ma0 ma0_1
+ Index Cond: (a < 1000)
+   ->  Sort
+ Sort Key: ma0_2.a
+ ->  Seq Scan on ma1 ma0_2
+   Filter: (a < 1000)
+(8 rows)
+
+drop table ma0 cascade;
+NOTICE:  drop cascades to table ma1
 --
 -- Test merge-append for UNION ALL append relations
 --
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 971aac54fd..811da462b7 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -624,6 +624,19 @@ reset enable_nestloop;
 
 drop table matest0 cascade;
 
+-- Check that merge append is chosen in presence of filters.
+create table ma0(a int primary key);
+create table ma1() inherits (ma0);
+insert into ma0 select generate_series(1, 1);
+insert into ma1 select generate_series(1, 1);
+analyze ma0;
+analyze ma1;
+
+explain (costs off) select * from ma0 where a < 1000 order by a;
+
+drop table ma0 cascade;
+
+
 --
 -- Test merge-append for UNION ALL append relations
 --


003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-01-30 Thread Nazir Bilal Yavuz
Hi,

I was trying to install newer Perl versions to Windows CI images and
found that 003_extrafiles.pl test fails on Windows with:

(0.183s) not ok 2 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir'
# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir'

(0.263s) not ok 5 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir'
# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir'

It looks like File::Find converts backslashes to slashes in the newer
Perl versions. I tried to find the related commit and found this:
https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d

To solve this, I did the same conversion for Windows before comparing
the paths. And to support all Perl versions, I decided to always
convert them on Windows regardless of the Perl's version. The fix is
attached.

I looked at other File::Find appearances in the code but they do not
compare the paths. So, I do not think there is any need to fix them.

Any kind of feedback would be appreciated.

--
Regards,
Nazir Bilal Yavuz
Microsoft
From b28f48fe7d98d3dd7d2fcf652bfa5c8a4cd1c2d6 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 30 Jan 2024 13:12:41 +0300
Subject: [PATCH v1] Fix 003_extrafiles.pl test for the Windows

File::Find converts backslashes to slashes in the newer Perl versions.
See: https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d

So, do the same conversion for Windows before comparing paths. To
support all Perl versions, always convert them on Windows regardless of
the Perl's version.
---
 src/bin/pg_rewind/t/003_extrafiles.pl | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index d54dcddcbb6..7e4c2771ce7 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -78,6 +78,19 @@ sub run_test
 		},
 		$test_primary_datadir);
 	@paths = sort @paths;
+
+	# File::Find converts backslashes to slashes in the newer Perl
+	# versions. To support all Perl versions, do the same conversion
+	# for Windows before comparing the paths.
+	if ($PostgreSQL::Test::Utils::windows_os)
+	{
+		for my $filename (@paths)
+		{
+			$filename =~ s{\\}{/}g;
+		}
+		$test_primary_datadir =~ s{\\}{/}g;
+	}
+
 	is_deeply(
 		\@paths,
 		[
-- 
2.43.0



Re: Reducing output size of nodeToString

2024-01-30 Thread Matthias van de Meent
On Tue, 9 Jan 2024, 09:23 Peter Eisentraut,  wrote:
>
> On 04.01.24 00:23, Matthias van de Meent wrote:
> > Something like the attached? It splits out into the following
> > 0001: basic 'omit default values'
>
>   /* Write an integer field (anything written as ":fldname %d") */
> -#define WRITE_INT_FIELD(fldname) \
> +#define WRITE_INT_FIELD_DIRECT(fldname) \
>  appendStringInfo(str, " :" CppAsString(fldname) " %d",
> node->fldname)
> +#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
> +   ((node->fldname == default) ? (0) : WRITE_INT_FIELD_DIRECT(fldname))
> +#define WRITE_INT_FIELD(fldname) \
> +   WRITE_INT_FIELD_DEFAULT(fldname, 0)
>
> Do we need the _DIRECT macros at all?  Could we not combine that into
> the _DEFAULT ones?

I was planning on using them to reduce the size of generated code for
select fields that we know we will always serialize, but then later
decided against doing that in this patch as it'd add even more
arbitrary annotations to nodes. This is a leftover from that.

> I think the way the conditional operator (?:) is written is not
> technically correct C,
> [...]
> I think it would be better to write this
> in a more straightforward way like
>
> #define WRITE_INT_FIELD_DEFAULT(fldname, default) \
> do { \
> [...]
> while (0)
>
> Relatedly, this
>
> +/* a scaffold function to read an optionally-omitted field */
> [...]
> would need to be written with a do { } while (0) wrapper around it.

I'll fix that.

> > 0002: reset location and other querystring-related node fields for all
> > catalogs of type pg_node_tree.
>
> This goal makes sense, but I think it can be done in a better way.  If
> you look into the area of stringToNode(), stringToNodeWithLocations(),
> and stringToNodeInternal(), there already is support for selectively
> resetting or omitting location fields.  Notably, this works with the
> existing automated knowledge of where the location fields are and
> doesn't require a new hand-maintained table.  I think the way forward
> here would be to apply a similar toggle to nodeToString() (the reverse).

I'll try to work something out for that.

> > 0003: add default marking on typmod fields.
> > 0004 & 0006: various node fields marked with default() based on
> > observed common or initial values of those fields
>
> I think we could get about half the benefit here more automatically, by
> creating a separate type for typmods, like
>
> typedef int32 TypMod;
>
> and then having the node support automatically generate the
> serialization support with a -1 default.

Hm, I suspect that the code churn for that would be significant. I'd
also be confused when the type in storage (pg_attribute, pg_type's
typtypmod) is still int32 when it would be TypMod only in nodes.

> (A similar thing could be applied to the location fields, which would
> allow us to get rid of the current hack of parsing out the name.)

I suppose so.

> Most of the other defaults I'm doubtful about.  First, we are colliding
> here between the goals of minimizing the storage size and making the
> debug output more readable.

I've never really wanted to make the output "more readable". The
current one is too verbose, yes.

> If a Query dump would now omit the
> commandType field if it is CMD_SELECT, I think that would be widely
> confusing, and one would need to check the source code to identify the
> reason.

AFAIK, SELECT is the only command type you can possibly store in a
view (insert/delete/update/utility are all invalid there, and while
I'm not fully certain about MERGE, I'd say it's certainly a niche).

> Also, what if we later decide to change a "default" for a
> field.  Then output between version would differ.  Of course, node
> output does change between versions in general, but these kinds of
> differences would be confusing.

I've not heard of anyone trying to read and compare the contents of
pg_node_tree manually where they're not trying to debug some
deep-nested issue. Note

> Second, this relies on hand-maintained
> annotations that were created by you presumably through a combination of
> intuition and testing, based on what is in the template database.  Do we
> know whether this matches real-world queries created by users later?

No, or at least I don't know this for certain. But I think it's a good start.

> Also, my experience dealing with the node support over the last little
> while is that these manually maintained exceptions get ossified and
> outdated and create a maintenance headache for the future.

I'm not sure what headache this would become. nodeToString is a fairly
straightforward API with (AFAIK) no external dependencies, where only
nodes go in and out. The metadata on top of that will indeed require
some maintenance, but AFAIK only in the areas that read and utilize
said metadata. While it certainly wouldn't be great if we didn't have
this metadata, it'd be no worse than not having compression.

> > 0005: truncate trailing 0s from outDatum
>
> Does this significantly af

Re: why there is not VACUUM FULL CONCURRENTLY?

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-30, Pavel Stehule wrote:

> some basic variant (without autovacuum support) can be good enough. We have
> no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
> for it (sure, it can be limited by my perspective) . The necessity of
> reducing table size is not too common (a lot of use cases are better
> covered by using partitioning), but sometimes it is, and then buildin
> simple available solution can be helpful.

That's my thinking as well.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: meson + libpq_pipeline

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-29, Jelte Fennema-Nio wrote:

> On Mon, 29 Jan 2024 at 19:37, Tristan Partin  wrote:
> >
> > On Mon Jan 29, 2024 at 11:37 AM CST, Alvaro Herrera wrote:
> > > I just realized while looking at Jelte's patch for the new nonblocking
> > > query cancel stuff that the Meson build doesn't run the libpq_pipeline
> > > tests :-(

> > I can try to take a look for you. Not sure how hard it will be, but
> > I can take a crack at it this week.
> 
> It already does afaik.
> 
> src/test/modules/libpq_pipeline/meson.build includes rules to execute
> t/001_libpq_pipeline.pl

Oh, you're right, I missed that.  I was looking for something in the
meson rules that would run "libpq_pipeline tests", but actually that's
what t/001_libpq_pipeline.pl does internally.  (Probably a good thing
too, because it's quite likely that what libpq_pipeline emits as output
is not valid TAP.  I didn't ever tested it that way.)

Thanks Tristan for offering to help,

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)




Re: Some revises in adding sorting path

2024-01-30 Thread Richard Guo
On Tue, Jan 30, 2024 at 7:00 PM David Rowley  wrote:

> On Mon, 29 Jan 2024 at 22:39, Richard Guo  wrote:
> > So in the v3 patch I've brought back the logic that considers
> > incremental sort on partial paths in gather_grouping_paths().  However,
> > I failed to compose a test case for this scenario without having to
> > generate a huge table.  So in the v3 patch I did not include a test case
> > for this aspect.
>
> Can you share the test with the huge table?


The test had been shown in upthread [1].  Pasting it here:

create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from
generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
QUERY PLAN
--
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Gather Merge
 Workers Planned: 2
 ->  Incremental Sort
   Sort Key: a, c, b, (parallel_safe_volatile(d))
   Presorted Key: a
   ->  Partial GroupAggregate
 Group Key: a, b, c, (parallel_safe_volatile(d))
 ->  Incremental Sort
   Sort Key: a, b, c, (parallel_safe_volatile(d))
   Presorted Key: a, b
   ->  Parallel Index Scan using t_a_b_idx on t
(13 rows)


> I tried and failed as, if I'm not mistaken, you're talking about a
> parallel aggregate query with an incremental sort between the Partial
> Aggregate node and the Finalize Group Aggregate node.  If the partial
> aggregate was a Group Aggregate, then it would already be correctly
> sorted.  We don't need a more strict sort ordering to perform the
> Finalize Group Aggregate, the results must already be sorted by at
> least the GROUP BY clause.  If the partial aggregate had opted to Hash
> Aggregate, then there'd be no presorted keys, so we could only get a
> full sort.  I can't see any way to get an incremental sort between the
> 2 aggregate phases.
>
> What am I missing?


This was true before 0452b461bc, and I reached the same conclusion in
[2].  Quote it here:

"
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.
"

But if we've reordered the group by keys to match the input path's
pathkeys, we might have a partial GroupAggregate that is partially
sorted.  See the plan above.


> I also tried reverting your changes to planner.c to see if your new
> tests would fail. They all passed. So it looks like none of these
> tests are testing anything new.


This patchset does not aim to introduce anything new; it simply
refactors the existing code.  The newly added tests are used to show
that the code that is touched here is not redundant, but rather
essential for generating certain paths.  I remember the tests were added
per your comment in [3].

[1]
https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

Thanks
Richard


Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-01-30 Thread Nazir Bilal Yavuz
Hi,

On Tue, 30 Jan 2024 at 14:21, Nazir Bilal Yavuz  wrote:
>
> It looks like File::Find converts backslashes to slashes in the newer
> Perl versions. I tried to find the related commit and found this:
> https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d

I forgot to mention that I used Strawberry Perl and these errors
started with 'Perl v5.36.1.1'.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-01-30 Thread Andrew Dunstan



On 2024-01-30 Tu 06:21, Nazir Bilal Yavuz wrote:

Hi,

I was trying to install newer Perl versions to Windows CI images and
found that 003_extrafiles.pl test fails on Windows with:

(0.183s) not ok 2 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir'
# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir'

(0.263s) not ok 5 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir'
# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir'

It looks like File::Find converts backslashes to slashes in the newer
Perl versions. I tried to find the related commit and found this:
https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d

To solve this, I did the same conversion for Windows before comparing
the paths. And to support all Perl versions, I decided to always
convert them on Windows regardless of the Perl's version. The fix is
attached.

I looked at other File::Find appearances in the code but they do not
compare the paths. So, I do not think there is any need to fix them.

Any kind of feedback would be appreciated.



Looks reasonable on the face of it. I'll see about pushing this today.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-01-30 Thread Zhijie Hou (Fujitsu)
On Tuesday, January 30, 2024 11:21 AM vignesh C  wrote:
> 
> On Tue, 30 Jan 2024 at 07:24, Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Monday, January 29, 2024 9:22 PM vignesh C 
> wrote:
> > >
> > > On Fri, 26 Jan 2024 at 11:30, Alexander Lakhin 
> wrote:
> > > >
> > > > Hello hackers,
> > > >
> > > > After determining a possible cause for intermittent failures of
> > > > the test subscription/031_column_list [1], I was wondering what
> > > > makes another subscription test (014_binary) fail on the buildfarm:
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly&d
> > > > t=20
> > > > 24-01-22%2001%3A19%3A03
> > > >
> > >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2
> > > 02
> > > > 4-01-14%2018%3A19%3A20
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt
> > > > =202
> > > > 3-12-21%2001%3A11%3A52
> > > >
> > >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2
> > > 02
> > > > 3-11-27%2001%3A42%3A39
> > > >
> > > > All those failures caused by a timeout when waiting for a message
> > > > expected in _subscriber.log. For example, in the snakefly's case:
> > > > [01:14:46.158](1.937s) ok 7 - check synced data on subscriber with
> > > > custom type timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)?
> > > > incorrect binary data format) at
> > > /home/bf/bf-build/piculet/HEAD/pgsql/src/test/subscription/t/014_bin
> > > ary.pl
> > > line 269.
> > > >
> > > > _subscriber.log contains:
> > > > 2023-12-21 01:14:46.215 UTC [410039] 014_binary.pl LOG:  statement:
> > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > > > 2023-12-21 01:17:46.756 UTC [40] ERROR:  could not receive
> > > > data from WAL stream: server closed the connection unexpectedly
> > > >  This probably means the server terminated abnormally
> > > >  before or while processing the request.
> > > > 2023-12-21 01:17:46.760 UTC [405057] LOG:  background worker
> > > > "logical replication apply worker" (PID 40) exited with exit
> > > > code 1
> > > > 2023-12-21 01:17:46.779 UTC [532857] LOG:  logical replication
> > > > apply worker for subscription "tsub" has started ...
> > > >
> > > > While _subscriber.log from a successful test run contains:
> > > > 2024-01-26 03:49:07.065 UTC [9726:5] 014_binary.pl LOG:  statement:
> > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > > > 2024-01-26 03:49:07.075 UTC [9726:6] 014_binary.pl LOG: disconnection:
> > > > session time: 0:00:00.014 user=postgres database=postgres
> > > > host=[local]
> > > > 2024-01-26 03:49:07.558 UTC [9729:1] LOG:  logical replication
> > > > apply worker for subscription "tsub" has started
> > > > 2024-01-26 03:49:07.563 UTC [9731:1] LOG:  logical replication
> > > > table synchronization worker for subscription "tsub", table
> > > > "test_mismatching_types" has started
> > > > 2024-01-26 03:49:07.585 UTC [9731:2] ERROR:  incorrect binary data
> > > > format
> > > > 2024-01-26 03:49:07.585 UTC [9731:3] CONTEXT:  COPY
> > > > test_mismatching_types, line 1, column a
> > > >
> > > > In this case, "logical replication apply worker for subscription
> > > > "tsub" has started" appears just after "ALTER SUBSCRIPTION", not 3
> > > > minutes
> > > later.
> > > >
> > > > I've managed to reproduce this failure locally by running multiple
> > > > tests in parallel, and my analysis shows that it is caused by a
> > > > race condition when accessing variable table_states_valid inside
> tablesync.c.
> > > >
> > > > tablesync.c does the following with table_states_valid:
> > > > /*
> > > >   * Callback from syscache invalidation.
> > > >   */
> > > > void
> > > > invalidate_syncing_table_states(Datum arg, int cacheid, uint32
> > > > hashvalue) {
> > > >  table_states_valid = false;
> > > > }
> > > > ...
> > > > static bool
> > > > FetchTableStates(bool *started_tx) { ...
> > > >  if (!table_states_valid)
> > > >  {
> > > > ...
> > > >  /* Fetch all non-ready tables. */
> > > >  rstates = GetSubscriptionRelations(MySubscription->oid,
> > > > true); ...
> > > >  table_states_valid = true;
> > > >  }
> > > >
> > > > So, when syscache invalidation occurs inside the code block "if
> > > > (!table_states_valid)", that invalidation is effectively ignored.
> > > >
> > > > In a failed case I observe the following events:
> > > > 1. logical replication apply worker performs
> > > >   LogicalRepApplyLoop() -> process_syncing_tables() ->
> > > >   process_syncing_tables_for_apply() -> FetchTableStates() periodically.
> > > >
> > > > 2. ALTER SUBSCRIPTION tsub REFRESH PUBLICATION invalidates
> syscache
> > > >   for SUBSCRIPTIONRELMAP, and that leads to calling
> > > >   invalidate_syncing_table_states().
> > > >
> > > > 3. If the apply worker manages to fetch no non-ready tables in
> > > >   FetchTableStates() and ignore "table_states_valid = false" from
> > > >   invalidate_syncing_table_states(), then it just misses the 
> > > > invalidation
> > > 

Re: Supporting MERGE on updatable views

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-29, Dean Rasheed wrote:

> Yeah, for all practical purposes, that check in CheckValidResultRel()
> has been dead code since d751ba5235, but I think it's still worth
> doing, and if we're going to do it, we should do it properly. I don't
> like using elog() in some cases and ereport() in others, but I also
> don't like having that much duplicated code between this and the
> rewriter (and this patch doubles the size of that block).
> 
> A neater solution is to export the rewriter functions and use them in
> CheckValidResultRel(). All these checks can then be reduced to
> 
> if (!view_has_instead_trigger(...))
> error_view_not_updatable(...)
> 
> which eliminates a lot of duplicated code and means that we now have
> just one place that throws these errors.

This looks quite nice, thanks.  LGTM.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)




Re: Question on LWLockMode in dsa.c

2024-01-30 Thread Masahiko Sawada
On Tue, Jan 30, 2024 at 4:43 PM Bharath Rupireddy
 wrote:
>
> On Tue, Jan 30, 2024 at 6:24 AM Masahiko Sawada  wrote:
> >
> > Hi,
> >
> > While working on radix tree patch[1], John Naylor found that dsa.c
> > doesn't already use shared locks even in dsa_dump(). dsa_dump() seems
> > a pure read-only function so I thought we could use a shared lock mode
> > there. Is there any reason to use exclusive mode even in dsa_dump()?
>
> check_for_freed_segments_locked which is called from dsa_dump does
> mark a few variables of the dsa area to NULL right?

Right. But IIUC these are backend-local variables, and we don't change
any shared variables.

> Are you suggesting
> to take initially a shared lock in dsa_dump and upgrade it to
> exclusive mode in the unlikely event of
> unlikely(area->freed_segment_counter != freed_segment_counter)
> occurring in check_for_freed_segments_locked?

I thought we could just take a shared lock in dsa_dump().

>
> > Ultimately, since we're trying to add a new function
> > dsa_get_total_size() that just returns
> > dsa_area_control.total_segment_size and therefore would also be a
> > read-only function, I'd like to find out the correct lock mode there.
> >
> > [1] 
> > https://www.postgresql.org/message-id/CANWCAZYYzoKp_4%2B1m5mn-TRD62BTwom8iLXLOWMsHkkwFi%3Drzg%40mail.gmail.com
>
> Do you see any lock contention or any other issues with exclusive
> locks on dsa areas?

Not particularly for now. But we don't need to unnecessarily take an
exclusive lock and even a shared lock looks safe at a glance. So I
need to convince myself that we need to take an exclusive lock there
in order to add a new dsa_get_total_size() function. And we might need
comments.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Incorrect cost for MergeAppend

2024-01-30 Thread Aleksander Alekseev
Hi,

> Here is a small patch that reproduces the problem on two tables with
> inheritance, and fixes it. I'll add it to the Commitfest.

Thanks for the patch.

I can confirm that it changes the plan from Sort + Append to MergeAppend.

Before:

```
explain (costs off) select * from ma0 where a < 1000 order by a;
   QUERY PLAN
-
 Sort
   Sort Key: ma0.a
   ->  Append
 ->  Index Only Scan using ma0_pkey on ma0 ma0_1
   Index Cond: (a < 1000)
 ->  Seq Scan on ma1 ma0_2
   Filter: (a < 1000)
```

After:

```
=# explain (costs off) select * from ma0 where a < 1000 order by a;
QUERY PLAN
---
 Merge Append
   Sort Key: ma0.a
   ->  Index Only Scan using ma0_pkey on ma0 ma0_1
 Index Cond: (a < 1000)
   ->  Sort
 Sort Key: ma0_2.a
 ->  Seq Scan on ma1 ma0_2
   Filter: (a < 1000)
```

The rest of the tests pass.

-- 
Best regards,
Aleksander Alekseev




Re: Incorrect cost for MergeAppend

2024-01-30 Thread David Rowley
On Wed, 31 Jan 2024 at 00:06, Alexander Kuzmenkov
 wrote:
> Here is a small patch that reproduces the problem on two tables with
> inheritance, and fixes it. I'll add it to the Commitfest.

Good catch.

It seems to have been broken since MergeAppends were added in 11cad29c9.

The code fix looks good to me.

The tests look a bit heavy.  I wondered if you could have used a UNION
ALL query with the tenk1 table so you didn't have to create tables,
however, I see that case works ok since the parent->tuples is set in
set_subquery_size_estimates() from "rel->tuples =
sub_final_rel->cheapest_total_path->rows;".

I played around with the following and see your patch changes the plan
to a Merge Append away from an Append -> Sort plan.

create table tenk_parent (like tenk1);
alter table tenk1 inherit tenk_parent;
alter table tenk2 inherit tenk_parent;
explain (costs off) select * from tenk_parent where thousand = 0 order
by tenthous;
alter table tenk1 no inherit tenk_parent;
alter table tenk2 no inherit tenk_parent;

I'm just not sure if we should be messing with tenk1 and tenk2 like that.

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

David




scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Alvaro Herrera
I noticed while answering a question that commit b577743000cd added the
GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
the PQparameterStatus documentation.

Here's a proposed patch to add it there.

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d0d5aefadc..13bd82efc6 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2521,7 +2521,8 @@ const char *PQparameterStatus(const PGconn *conn, const 
char *paramName);
DateStyle,
IntervalStyle,
TimeZone,
-   integer_datetimes, and
+   integer_datetimes,
+   scram_iterations, and
standard_conforming_strings.
(server_encoding, TimeZone, and
integer_datetimes were not reported by releases 
before 8.0;


Further thoughts:

1. that list looks to be in random order.  Should we sort it
alphabetically?

2. the notes about the versions in which some parameters started to be
reported, look quite outdated.  We don't really care about things not
reported in 8.0 or 8.1 or even 9.0.  For all purposes, it seems
perfectly OK to say that these parameters have been reported forever
(i.e. don't mention them in these lists).  I think we should remove all
those, except the note about version 14.

3. Should we list scram_iterations as having started to be reported with
version 16?  The GUC didn't exist before that; but we could say that if
it's not reported, then the application can assume that the value is
4096 (similar to the wording for standard_conforming_strings).

Thanks

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los cuentos de hadas no dan al niño su primera idea sobre los monstruos.
Lo que le dan es su primera idea de la posible derrota del monstruo."
   (G. K. Chesterton)




Re: Rename setup_cancel_handler in pg_dump

2024-01-30 Thread Daniel Gustafsson
> On 26 Jan 2024, at 01:42, Yugo NAGATA  wrote:

> I am proposing it because there is a public function with
> the same name in fe_utils/cancel.c. I know pg_dump/parallel.c
> does not include fe_utils/cancel.h, so there is no conflict,
> but I think it is better to use different names to reduce
> possible confusion. 

Given that a "git grep setup_cancel_hander" returns hits in pg_dump along with
other frontend utils, I can see the risk of confusion.

-setup_cancel_handler(void)
+pg_dump_setup_cancel_handler(void)

We don't have any other functions prefixed with pg_dump_, based on the naming
of the surrounding code in the file I wonder if set_cancel_handler is a more
appropriate name?

--
Daniel Gustafsson





Re: Functions to return random numbers in a given range

2024-01-30 Thread Aleksander Alekseev
Hi,

> Interestingly, the cfbot didn't pick up on the fact that it needed
> rebasing. Anyway, the copyright years in the new file's header comment
> needed updating, so here is a rebase doing that.

Maybe I'm missing something but I'm not sure if I understand what this
test tests particularly:

```
-- There should be no triple duplicates in 1000 full-range 32-bit random()
-- values.  (Each of the C(1000, 3) choices of triplets from the 1000 values
-- has a probability of 1/(2^32)^2 of being a triple duplicate, so the
-- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which
-- is roughly 9e-12.)
SELECT r, count(*)
FROM (SELECT random(-2147483648, 2147483647) r
  FROM generate_series(1, 1000)) ss
GROUP BY r HAVING count(*) > 2;
```

The intent seems to be to check the fact that random numbers are
distributed evenly. If this is the case I think the test is wrong. The
sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4,
12, 45 and every particular sequence has low probability. All in all
personally I would argue that this is a meaningless test that just
fails with a low probability. Same for the tests that follow below.

The proper way of testing PRNG would be to call setseed() and compare
return values with expected ones. I don't mind testing the proposed
invariants but they should do this after calling setseed(). Currently
the patch places the tests right before the call.

-- 
Best regards,
Aleksander Alekseev




Re: Change GUC hashtable to use simplehash?

2024-01-30 Thread Ants Aasma
On Tue, 30 Jan 2024 at 12:04, John Naylor  wrote:
>
> On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma  wrote:
> > But given that we know the data length and we have it in a register
> > already, it's easy enough to just mask out data past the end with a
> > shift. See patch 1. Performance benefit is about 1.5x Measured on a
> > small test harness that just hashes and finalizes an array of strings,
> > with a data dependency between consecutive hashes (next address
> > depends on the previous hash output).
>
> Interesting work! I've taken this idea and (I'm guessing, haven't
> tested) improved it by re-using an intermediate step for the
> conditional, simplifying the creation of the mask, and moving the
> bitscan out of the longest dependency chain. Since you didn't attach
> the test harness, would you like to run this and see how it fares?
> (v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan
> to test myself as well, but since your test tries to model true
> latency, I'm more interested in that one.

It didn't calculate the same result because the if (mask) condition
was incorrect. Changed it to if (chunk & 0xFF) and removed the right
shift from the mask. It seems to be half a nanosecond faster, but as I
don't have a machine set up for microbenchmarking it's quite close to
measurement noise.

I didn't post the harness as it's currently so messy to be near
useless to others. But if you'd like to play around,  I can tidy it up
a bit and post it.

> > Not sure if the second one is worth the extra code.
>
> I'd say it's not worth optimizing the case we think won't be taken
> anyway. I also like having a simple path to assert against.

Agreed.

As an addendum, I couldn't resist trying out using 256bit vectors with
two parallel AES hashes running, unaligned loads with special casing
page boundary straddling loads. Requires -march=x86-64-v3 -maes. About
20% faster than fasthash on short strings, 2.2x faster on 4k strings.
Right now requires 4 bytes alignment (uses vpmaskmovd), but could be
made to work with any alignment.

Regards,
Ants Aasma
#include 
#include 

#define PAGE_SIZE 0x1000

uint64_t
fast_vec_hash_cstring_avx2(char *buf)
{
__m128i hash0 = {0, 0};
__m128i hash1 = {0, 0};

__m128i k0 = {0x0807060504030201, 0x100F0E0D0C0B0A09};
__m128i k1 = {0x1117161514131211, 0x201F1E1D1C1B1A19};

char *cur = buf;

int mask;
__m256i chunk;
int offset = (uintptr_t) buf & (sizeof(chunk) - 1);
int endpos;


do {

char *end_of_page = (char*) uintptr_t) cur) | (PAGE_SIZE-1)) + 1);
for (; cur + sizeof(chunk) <= end_of_page; cur += sizeof(chunk))
{
chunk = _mm256_loadu_si256((__m256i*) cur);
__m256i ends = _mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(0));
mask = _mm256_movemask_epi8(ends);
if (mask)
goto last_iteration;
hash0 = _mm_aesenc_si128(hash0, k0);
hash1 = _mm_aesenc_si128(hash1, k1);
hash0 = _mm_aesenc_si128(hash0, _mm256_extracti128_si256(chunk, 0));
hash1 = _mm_aesenc_si128(hash1, _mm256_extracti128_si256(chunk, 1));
}
if (offset)
{
__m256i load_mask = _mm256_cmpgt_epi32(_mm256_set1_epi32(offset / 4), _mm256_setr_epi32(0,1,2,3,4,5,6,7));
chunk = _mm256_maskload_epi32((const int*) cur, load_mask);
__m256i ends = load_mask & _mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(0));
mask = _mm256_movemask_epi8(ends);
if (mask)
goto last_iteration;
chunk |= _mm256_maskload_epi32((const int*) cur, load_mask);
ends = load_mask & _mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(0));
mask = _mm256_movemask_epi8(ends);
if (mask)
goto last_iteration;
hash0 = _mm_aesenc_si128(hash0, k0);
hash1 = _mm_aesenc_si128(hash1, k1);
hash0 = _mm_aesenc_si128(hash0, _mm256_extracti128_si256(chunk, 0));
hash1 = _mm_aesenc_si128(hash1, _mm256_extracti128_si256(chunk, 1));
cur += sizeof(chunk);
}
} while(1);


last_iteration:
// chunk contains data, mask contains location of end of line
endpos = _tzcnt_u32(mask);
_mm256_cmpgt_epi8(_mm256_set1_epi8(endpos), _mm256_setr_epi8(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31));
hash0 = _mm_aesenc_si128(hash0, k0);
hash1 = _mm_aesenc_si128(hash1, k1);
hash0 = _mm_aesenc_si128(hash0, _mm256_extracti128_si256(chunk, 0));
hash1 = _mm_aesenc_si128(hash1, _mm256_extracti128_si256(chunk, 1));

hash0 = _mm_aesenc_si128(hash0, k0);
hash1 = _mm_aesenc_si128(hash1, k1);
hash0 = _mm_aesenc_si128(hash0, k1);
hash1 = _mm_aesenc_si128(hash1, k0);
hash0 = _mm_aesenc_si128(hash0, k0);
hash1 = _mm_aesenc_si128(hash1, k1);

__m128i intermediate = hash1 ^ hash0;
return intermediate[1] ^ intermediate[0];

Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
Hi

so 6. 1. 2024 v 16:51 odesílatel vignesh C  napsal:

> On Fri, 21 Jul 2023 at 02:59, Ivan Panchenko  wrote:
> >
> > Friday, 14 July 2023, 23:27 +03:00 от Tom Lane :
> >
> > =?UTF-8?B?SXZhbiBQYW5jaGVua28=?=  writes:
> > > Четверг, 6 июля 2023, 14:48 +03:00 от Peter Eisentraut <
> pe...@eisentraut.org >:
> > >> If the transform deals with a built-in type, then they should just be
> > >> added to the respective pl extension directly.
> >
> > > The new extension bytea_plperl can be easily moved into plperl now,
> but what should be do with the existing ones, namely jsonb_plperl and
> bool_plperl ?
> > > If we leave them where they are, it would be hard to explain why some
> transforms are inside plperl while other ones live separately. If we move
> them into plperl also, wouldn’t it break some compatibility?
> >
> > It's kind of a mess, indeed. But I think we could make plperl 1.1
> > contain the additional transforms and just tell people they have
> > to drop the obsolete extensions before they upgrade to 1.1.
> > Fortunately, it doesn't look like functions using a transform
> > have any hard dependency on the transform, so "drop extension
> > jsonb_plperl" followed by "alter extension plperl update" should
> > work without cascading to all your plperl functions.
> >
> > Having said that, we'd still be in the position of having to
> > explain why some transforms are packaged with plperl and others
> > not. The distinction between built-in and contrib types might
> > be obvious to us hackers, but I bet a lot of users see it as
> > pretty artificial. So maybe the existing packaging design is
> > fine and we should just look for a way to reduce the code
> > duplication.
> >
> > The code duplication between different transforms is not in C code, but
> mostly in copy-pasted Makefile, *.control, *.sql and meson-build. These
> files could be generated from some universal templates. But, keeping in
> mind Windows compatibility and presence of three build system, this hardly
> looks like a simplification.
> > Probably at present time it would be better to keep the existing code
> duplication until plperl 1.1.
> > Nevertheless, dealing with code duplication is a wider task than the
> bytea transform, so let me suggest to keep this extension in the present
> form. If we decide what to do with the duplication, it would be another
> patch.
> >
> > The mesonified and rebased version of the transform patch is attached.
>
> The patch needs to be rebased as these changes are not required anymore:
> diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
> index 9e05eb91b1..ec0a3f8097 100644
> --- a/src/tools/msvc/Mkvcbuild.pm
> +++ b/src/tools/msvc/Mkvcbuild.pm
> @@ -43,7 +43,7 @@ my $contrib_extralibs = { 'libpq_pipeline' =>
> ['ws2_32.lib'] };
>  my $contrib_extraincludes = {};
>  my $contrib_extrasource = {};
>  my @contrib_excludes = (
> -   'bool_plperl', 'commit_ts',
> +   'bool_plperl', 'bytea_plperl', 'commit_ts',
> 'hstore_plperl', 'hstore_plpython',
> 'intagg', 'jsonb_plperl',
> 'jsonb_plpython', 'ltree_plpython',
> @@ -791,6 +791,9 @@ sub mkvcbuild
> my $bool_plperl = AddTransformModule(
> 'bool_plperl', 'contrib/bool_plperl',
> 'plperl', 'src/pl/plperl');
> +   my $bytea_plperl = AddTransformModule(
> +   'bytea_plperl', 'contrib/bytea_plperl',
> +   'plperl',  'src/pl/plperl');
>
> Regards,
> Vignesh
>
>
>
I am checking this patch, it looks well. All tests passed. I am sending a
cleaned patch.

I did minor formatting cleaning.

I inserted perl reference support - hstore_plperl and json_plperl does it.

+<->/* Dereference references recursively. */
+<->while (SvROK(in))
+<-><-->in = SvRV(in);

Regards

Pavel
From 340b945f06335088d22280755923e2c719e4b598 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Tue, 30 Jan 2024 10:31:00 +0100
Subject: [PATCH] supports bytea transformation for plperl

---
 contrib/Makefile  |  4 +-
 contrib/bytea_plperl/.gitignore   |  4 ++
 contrib/bytea_plperl/Makefile | 39 ++
 contrib/bytea_plperl/bytea_plperl--1.0.sql| 19 +++
 contrib/bytea_plperl/bytea_plperl.c   | 48 +
 contrib/bytea_plperl/bytea_plperl.control |  7 +++
 contrib/bytea_plperl/bytea_plperlu--1.0.sql   | 19 +++
 contrib/bytea_plperl/bytea_plperlu.control|  6 +++
 .../bytea_plperl/expected/bytea_plperl.out| 36 +
 .../bytea_plperl/expected/bytea_plperlu.out   | 36 +
 contrib/bytea_plperl/meson.build  | 51 +++
 contrib/bytea_plperl/sql/bytea_plperl.sql | 22 
 contrib/bytea_plperl/sql/bytea_plperlu.sql| 22 
 contrib/meson.build   |  1 +
 doc/src/sgml/plperl.sgml  | 16 ++
 15 files changed, 328 insertions(+), 2 dele

Re: scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Daniel Gustafsson
> On 30 Jan 2024, at 13:36, Alvaro Herrera  wrote:
> 
> I noticed while answering a question that commit b577743000cd added the
> GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
> the PQparameterStatus documentation.

Ugh, thanks for fixing!

> 1. that list looks to be in random order.  Should we sort it
> alphabetically?

It seems to have some semblance of grouping per GUC functionality, but not
quite.  I'm not sure sorting alphabetically will improve the current state
though.

> 2. the notes about the versions in which some parameters started to be
> reported, look quite outdated.  We don't really care about things not
> reported in 8.0 or 8.1 or even 9.0.  For all purposes, it seems
> perfectly OK to say that these parameters have been reported forever
> (i.e. don't mention them in these lists).  I think we should remove all
> those, except the note about version 14.

Agreed.

> 3. Should we list scram_iterations as having started to be reported with
> version 16?

Yes, similar to in_hot_standby for v14.

> The GUC didn't exist before that; but we could say that if
> it's not reported, then the application can assume that the value is
> 4096 (similar to the wording for standard_conforming_strings).

There is no real practical use for knowing the value if it's not reported,
since there isn't anything the user can do differently knowing that.  I would
leave that out to avoid confusion, but I don't have strong feelings if you
think it should be added.

--
Daniel Gustafsson





Re: scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Jelte Fennema-Nio
On Tue, 30 Jan 2024 at 13:37, Alvaro Herrera  wrote:
>
> I noticed while answering a question that commit b577743000cd added the
> GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
> the PQparameterStatus documentation.

+1 the improvements your suggesting (although 3 I don't know enough
about to be sure)

One important note though is that this list is tracked in two
different places, so both of these places should be updated:
- doc/src/sgml/protocol.sgml
- doc/src/sgml/libpq.sgml




pg_stat_advisor extension

2024-01-30 Thread Илья Евдокимов
Hello PostgreSQL Hackers, I'm thrilled to share with you a new PostgreSQL extension I've developed, called 'pg_stat_advisor'. The genesis of this extension traces back to a conversation in this: https://www.postgresql.org/message-id/e2512fd5-77a4-825b-e456-c0586e37f293%40enterprisedb.com The 'pg_stat_advisor' extension is architected to optimize query plan. It operates by suggesting when to create extended statistics, particularly in queries where current selectivity estimates fall short. This is achieved through the GUC parameter 'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio of total tuples compared to the planned rows. This feature is instrumental in identifying scenarios where the planner's estimates could be optimized. A key strength of 'pg_stat_advisor' lies in its ability to recommend extended statistics, significantly bolstering query planning. You can install the extension by LOAD 'pg_stat_advisor';SET pg_stat_advisor.suggest_statistics_threshold = 1.0; Example: CREATE TABLE t (i INT, j INT);INSERT INTO t SELECT i/10, i/100 from generate_series(1, 100) i;ANALYZE t; EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10;NOTICE:  pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t                                                   QUERY PLAN  Gather  (cost=1000.00..11675.10 rows=1 width=8) (actual time=0.400..59.292 rows=10 loops=1)   Workers Planned: 2   Workers Launched: 2   ->  Parallel Seq Scan on t  (cost=0.00..10675.00 rows=1 width=8) (actual time=35.614..54.291 rows=3 loops=3)         Filter: ((i = 100) AND (j = 10))         Rows Removed by Filter: 30 Planning Time: 0.081 ms Execution Time: 59.413 ms(8 rows) After EXPLAIN ANALYZE command you can see the message of suggestion creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table. I look forward to your thoughts, feedback, and any suggestions you might have.  Best regards.Ilia Evdokimov,Tantor Labs LLC.From 9485605416030e79843feabf9c101a88703b9779 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 30 Jan 2024 13:49:07 +0300
Subject: [PATCH] 'pg_stat_advisor' extension.

This serves as a hook into the executor for determining total rows and planned rows.
The process starts by checking the `pg_stat_advisor.suggest_statistics_threshold` GUC parameter.
If it's set to 0.0, the extension does not proceed further. When the parameter is greater than 0.0,
the extension evaluates the accuracy of planned rows. A suggestion for creating statistics is made
if the ratio of total rows to planned rows is greater than or equal to this threshold.

Only then does it extract the relation and columns from the query.
The extension checks pg_statistic_ext for existing relevant statistics. If no statistics are found,
it prints a notice suggesting the creation of statistics, using the naming format 'relationName_columns'.

Author: Ilia Evdokimov
---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 477 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 763 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 

Re: Incorrect cost for MergeAppend

2024-01-30 Thread Alexander Kuzmenkov
On Tue, Jan 30, 2024 at 1:20 PM David Rowley  wrote:
> You should likely focus on trying to find a test that does not require
> making 2 tables with 10k rows.

Is 1k smallint OK? It should fit in one page. Still reproduces the
error, and the entire test case runs in under 10 ms.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8dbf790e89..2e1ec41a54 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1470,7 +1470,7 @@ create_merge_append_path(PlannerInfo *root,
 	  root,
 	  pathkeys,
 	  subpath->total_cost,
-	  subpath->parent->tuples,
+	  subpath->rows,
 	  subpath->pathtarget->width,
 	  0.0,
 	  work_mem,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 56a40d50f9..780499 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1716,6 +1716,28 @@ order by t1.b limit 10;
 reset enable_nestloop;
 drop table matest0 cascade;
 NOTICE:  drop cascades to table matest1
+-- Check that merge append is chosen in presence of filters.
+create table ma0(a smallint primary key);
+create table ma1() inherits (ma0);
+insert into ma0 select generate_series(1, 1000);
+insert into ma1 select generate_series(1, 1000);
+analyze ma0;
+analyze ma1;
+explain (costs off) select * from ma0 where a < 100 order by a;
+QUERY PLAN 
+---
+ Merge Append
+   Sort Key: ma0.a
+   ->  Index Only Scan using ma0_pkey on ma0 ma0_1
+ Index Cond: (a < 100)
+   ->  Sort
+ Sort Key: ma0_2.a
+ ->  Seq Scan on ma1 ma0_2
+   Filter: (a < 100)
+(8 rows)
+
+drop table ma0 cascade;
+NOTICE:  drop cascades to table ma1
 --
 -- Test merge-append for UNION ALL append relations
 --
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 971aac54fd..dae8cc7dbe 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -624,6 +624,19 @@ reset enable_nestloop;
 
 drop table matest0 cascade;
 
+-- Check that merge append is chosen in presence of filters.
+create table ma0(a smallint primary key);
+create table ma1() inherits (ma0);
+insert into ma0 select generate_series(1, 1000);
+insert into ma1 select generate_series(1, 1000);
+analyze ma0;
+analyze ma1;
+
+explain (costs off) select * from ma0 where a < 100 order by a;
+
+drop table ma0 cascade;
+
+
 --
 -- Test merge-append for UNION ALL append relations
 --


Re: Improvement discussion of custom and generic plans

2024-01-30 Thread Quan Zongliang



On 2023/11/3 15:27, Quan Zongliang wrote:

Hi

We have one such problem. A table field has skewed data. Statistics:
n_distinct | -0.4481973
most_common_vals   | {5f006ca25b52ed78e457b150ee95a30c}
most_common_freqs  | {0.5518474}


Data generation:

CREATE TABLE s_user (
  user_id varchar(32) NOT NULL,
  corp_id varchar(32),
  status int NOT NULL
  );

insert into s_user
select md5('user_id ' || a), md5('corp_id ' || a),
  case random()<0.877675 when true then 1 else -1 end
   FROM generate_series(1,10031) a;

insert into s_user
select md5('user_id ' || a), md5('corp_id 10032'),
  case random()<0.877675 when true then 1 else -1 end
   FROM generate_series(10031,22383) a;

CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id);

analyze s_user;


1. First, define a PREPARE statement
prepare stmt as select count(*) from s_user where status=1 and corp_id = 
$1;


2. Run it five times. Choose the custom plan.
explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c');

Here's the plan:
  Aggregate  (cost=639.84..639.85 rows=1 width=8) (actual 
time=4.653..4.654 rows=1 loops=1)

    Buffers: shared hit=277
    ->  Seq Scan on s_user  (cost=0.00..612.76 rows=10830 width=0) 
(actual time=1.402..3.747 rows=10836 loops=1)
  Filter: ((status = 1) AND ((corp_id)::text = 
'5f006ca25b52ed78e457b150ee95a30c'::text))

  Rows Removed by Filter: 11548
  Buffers: shared hit=277
  Planning Time: 0.100 ms
  Execution Time: 4.674 ms
(8 rows)

3.From the sixth time. Choose generic plan.
We can see that there is a huge deviation between the estimate and the 
actual value:
  Aggregate  (cost=11.83..11.84 rows=1 width=8) (actual 
time=4.424..4.425 rows=1 loops=1)

    Buffers: shared hit=154 read=13
    ->  Bitmap Heap Scan on s_user  (cost=4.30..11.82 rows=2 width=0) 
(actual time=0.664..3.371 rows=10836 loops=1)

  Recheck Cond: ((corp_id)::text = $1)
  Filter: (status = 1)
  Rows Removed by Filter: 1517
  Heap Blocks: exact=154
  Buffers: shared hit=154 read=13
  ->  Bitmap Index Scan on s_user_corp_id_idx  (cost=0.00..4.30 
rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1)

    Index Cond: ((corp_id)::text = $1)
    Buffers: shared read=13
  Planning Time: 0.246 ms
  Execution Time: 4.490 ms
(13 rows)

This is because in the choose_custom_plan function, the generic plan is 
attempted after executing the custom plan five times.


     if (plansource->num_custom_plans < 5)
     return true;

The generic plan uses var_eq_non_const to estimate the average selectivity.

These are facts that many people already know. So a brief introduction.


Our users actually use such parameter conditions in very complex PREPARE 
statements. Once they use the generic plan for the sixth time. The 
execution time will change from 5 milliseconds to 5 minutes.



To improve this problem. The following approaches can be considered:

1. Determine whether data skew exists in the PREPARE statement parameter 
conditions based on the statistics.

However, there is no way to know if the user will use the skewed parameter.

2.When comparing the cost of the generic plan with the average cost of 
the custom plan(function choose_custom_plan). Consider whether the 
maximum cost of a custom plan executed is an order of magnitude 
different from the cost of a generic plan.
If the first five use a small selectivity condition. And after the sixth 
use a high selectivity condition. Problems will still arise.


3.Trace the execution time of the PREPARE statement. When an execution 
time is found to be much longer than the average execution time, the 
custom plan is forced to run.



Is there any better idea?

I tried to do a demo. Add a member paramid to Const. When Const is 
generated by Param, the Const is identified as coming from Param. Then 
check in var_eq_const to see if the field in the condition using this 
parameter is skewed. If so, choose_custom_plan returns true every time, 
forcing custom_plan to be used.
Only conditional expressions such as var eq param or param eq var can be 
supported.

If it makes sense. Continue to improve this patch.


--
Quan Zongliang

diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index 94eb56a1e7..3384520dc1 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2489,6 +2489,8 @@ eval_const_expressions_mutator(Node *node,

pval,

prm->isnull,

typByVal);
+   if (paramLI->paramFetch 
== NULL)
+   con-

Re: UUID v7

2024-01-30 Thread Andrey M. Borodin


> On 30 Jan 2024, at 15:33, Junwang Zhao  wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.


v17-0001-Implement-UUID-v7.patch
Description: Binary data


Re: Optmize bitmapword macros calc (src/backend/nodes/bitmapset.c)

2024-01-30 Thread Ranier Vilela
Em seg., 29 de jan. de 2024 às 19:40, Nathan Bossart <
nathandboss...@gmail.com> escreveu:

> On Tue, Jan 30, 2024 at 11:23:57AM +1300, David Rowley wrote:
> > On Tue, 30 Jan 2024 at 08:32, Nathan Bossart 
> wrote:
> >> I'm currently +0.1 for this change.  I don't see any huge problem with
> >> trimming a few instructions, but I'm dubious there's any measurable
> impact.
> >> However, a cycle saved is a cycle earned...
> >
> > FWIW, In [1] and subsequent replies, there are several examples of
> > benchmarks where various bitmapset functions are sitting high in the
> > profiles. So I wouldn't be too surprised if such a small change to the
> > WORDNUM and BITNUM macros made a noticeable difference.
>
> Good to know, thanks.  If there is indeed demonstrable improvement, I'd
> readily adjust my +0.1 to +1.
>
Following the suggestions, I did a quick test with one of the scripts.

Ubuntu 64 bits
gcc 12.3 64 bits

create table t1 (a int) partition by list(a);
select 'create table t1_'||x||' partition of t1 for values
in('||x||');' from generate_series(0,9)x;

test1.sql
select * from t1 where a > 1 and a < 3;

pgbench -U postgres -n -f test1.sql -T 15 postgres

head:

tps = 27983.182940
tps = 28916.903038
tps = 29051.878855

patched:

tps = 27517.301246
tps = 27848.684133
tps = 28669.367300


create table t2 (a int) partition by list(a);
select 'create table t2_'||x||' partition of t2 for values
in('||x||');' from generate_series(0,)x;

test2.sql
select * from t2 where a > 1 and a < 3;

pgbench -U postgres -n -f test2.sql -T 15 postgres

head:

tps = 27144.044463
tps = 28932.948620
tps = 29299.016248

patched:

tps = 27363.364039
tps = 28588.141586
tps = 28669.367300

To my complete surprise, the change is slower.
I can't say how, with fewer instructions, gcc makes the binary worse.

best regards,
Ranier Vilela


Re: [PATCH] Add native windows on arm64 support

2024-01-30 Thread Andrew Dunstan


On 2024-01-29 Mo 11:20, Dave Cramer wrote:


Dave Cramer
www.postgres.rocks


On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan  wrote:


On 2024-01-26 Fr 09:18, Dave Cramer wrote:



On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan
 wrote:


On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan
 wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for
VS2022 is set up
>>> for x86 builds. AIUI you should start by executing
"vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here.  Based on [1], x64_arm64 means you can
use a x64
> host and you'll be able to produce ARM64 builds, still
these will not
> be able to run on the host where they were built.  How much
of the
> patch posted upthread is required to produce such builds? 
Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]:

https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170


If you look at the table here x86 and x64 are the only
supported host
architectures. But that's OK, the x64 binaries will run on
arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave
and I would
not have got as far as we have. But you want the x64_arm64
argument to
vcvarsall so you will get ARM64 output.


I've rebuilt it using  x64_arm64 and with the attached (very
naive patch) and I still get an x64 binary :(



With this patch I still get a build error, but it's different :-)


[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
   Creating library src\backend\postgres.exe.lib

storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external
symbol spin_delay referenced in function perform_spin_delay

src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals


Did you add the latest lock.patch ?





I'm a bit confused about exactly what needs to be applied. Can you 
supply a complete patch to be applied to a pristine checkout that will 
let me build?



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: compiling postgres on windows arm using meson

2024-01-30 Thread Nazir Bilal Yavuz
Hi,

On Thu, 18 Jan 2024 at 05:07, Dave Cramer  wrote:
>
> Greetings,
> Getting the following error
>
> [1146/2086] Generating src/backend/postgres.def with a custom command 
> (wrapped by meson to set PATH)
> FAILED: src/backend/postgres.def
> "C:\Program Files\Meson\meson.exe" "--internal" "exe" "--unpickle" 
> "C:\Users\davec\projects\postgresql\build\meson-private\meson_exe_perl.EXE_53b41ebc2e76cfc92dd6a2af212140770543faae.dat"
> while executing ['c:\\perl\\bin\\perl.EXE', 
> '../src/backend/../tools/msvc_gendef.pl', '--arch', 'aarch64', '--tempdir', 
> 'src/backend/postgres.def.p', '--deffile', 'src/backend/postgres.def', 
> 'src/backend/postgres_lib.a', 'src/common/libpgcommon_srv.a', 
> 'src/port/libpgport_srv.a']
> --- stdout ---
>
> --- stderr ---
> Usage: msvc_gendef.pl --arch  --deffile  --tempdir  
> files-or-directories
> arch: x86 | x86_64
> deffile: path of the generated file
> tempdir: directory for temporary files
> files or directories: object files or directory containing object files
>
> log attached

>From the docs [1]: PostgreSQL will only build for the x64 architecture
on 64-bit Windows.

So, I think that is expected.

[1] 
https://www.postgresql.org/docs/current/install-windows-full.html#INSTALL-WINDOWS-FULL-64-BIT

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Functions to return random numbers in a given range

2024-01-30 Thread Dean Rasheed
On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev
 wrote:
>
> Maybe I'm missing something but I'm not sure if I understand what this
> test tests particularly:
>
> ```
> -- There should be no triple duplicates in 1000 full-range 32-bit random()
> -- values.  (Each of the C(1000, 3) choices of triplets from the 1000 values
> -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the
> -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which
> -- is roughly 9e-12.)
> SELECT r, count(*)
> FROM (SELECT random(-2147483648, 2147483647) r
>   FROM generate_series(1, 1000)) ss
> GROUP BY r HAVING count(*) > 2;
> ```
>
> The intent seems to be to check the fact that random numbers are
> distributed evenly. If this is the case I think the test is wrong. The
> sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4,
> 12, 45 and every particular sequence has low probability. All in all
> personally I would argue that this is a meaningless test that just
> fails with a low probability. Same for the tests that follow below.
>

I'm following the same approach used to test the existing random
functions, and the idea is the same. For example, this existing test:

-- There should be no duplicates in 1000 random() values.
-- (Assuming 52 random bits in the float8 results, we could
-- take as many as 3000 values and still have less than 1e-9 chance
-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
SELECT r, count(*)
FROM (SELECT random() r FROM generate_series(1, 1000)) ss
GROUP BY r HAVING count(*) > 1;

If the underlying PRNG were non-uniform, or the method of reduction to
the required range was flawed in some way that reduced the number of
actual possible return values, then the probability of duplicates
would be increased. A non-uniform distribution would probably be
caught by the KS tests, but uniform gaps in the possible outputs might
not be, so I think this test still has value.

> The proper way of testing PRNG would be to call setseed() and compare
> return values with expected ones. I don't mind testing the proposed
> invariants but they should do this after calling setseed(). Currently
> the patch places the tests right before the call.
>

There are also new tests of that nature, following the call to
setseed(0.5). They're useful for a quick visual check of the results,
and confirming the expected number of digits after the decimal point
in the numeric case. However, I think those tests are insufficient on
their own.

Regards,
Dean




Re: POC, WIP: OR-clause support for indexes

2024-01-30 Thread jian he
On Tue, Dec 5, 2023 at 6:55 PM Andrei Lepikhov
 wrote:
>
> Here is fresh version with the pg_dump.pl regex fixed. Now it must pass
> buildfarm.

+JumbleState *
+JumbleExpr(Expr *expr, uint64 *queryId)
+{
+ JumbleState *jstate = NULL;
+
+ Assert(queryId != NULL);
+
+ jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+ /* Set up workspace for query jumbling */
+ jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+ jstate->jumble_len = 0;
+ jstate->clocations_buf_size = 32;
+ jstate->clocations = (LocationLen *)
+ palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+ jstate->clocations_count = 0;
+ jstate->highest_extern_param_id = 0;
+
+ /* Compute query ID */
+ _jumbleNode(jstate, (Node *) expr);
+ *queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
+
+ if (*queryId == UINT64CONST(0))
+ *queryId = UINT64CONST(1);
+
+ return jstate;
+}

+/*
+ * Hash function that's compatible with guc_name_compare
+ */
+static uint32
+orclause_hash(const void *data, Size keysize)
+{
+ OrClauseGroupKey   *key = (OrClauseGroupKey *) data;
+ uint64 hash;
+
+ (void) JumbleExpr(key->expr, &hash);
+ hash += ((uint64) key->opno + (uint64) key->exprtype) % UINT64_MAX;
+ return hash;
+}

correct me if i am wrong:
in orclause_hash, you just want to return a uint32, then why does the
JumbleExpr function return struct JumbleState.
here JumbleExpr, we just simply hash part of a Query struct,
so JumbleExpr's queryId would be confused with JumbleQuery function's queryId.

not sure the purpose of the following:
+ if (*queryId == UINT64CONST(0))
+ *queryId = UINT64CONST(1);

even if  *queryId is 0
`hash += ((uint64) key->opno + (uint64) key->exprtype) % UINT64_MAX;`
will make the hash return non-zero?

+ MemSet(&info, 0, sizeof(info));
i am not sure this is necessary.

Some comments on OrClauseGroupEntry would be great.

seems there is no doc.

create or replace function retint(int) returns int as
$func$
begin return $1 + round(10 * random()); end
$func$ LANGUAGE plpgsql;

set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (tenthous * retint(1) = NULL OR tenthous *
retint(1) = 3) OR thousand = 41;

returns:
QUERY PLAN
---
 Aggregate
   ->  Seq Scan on tenk1
 Filter: (((thousand = 42) AND ((tenthous * retint(1)) = ANY
('{NULL,3}'::integer[]))) OR (thousand = 41))
(3 rows)

Based on the query plan, retint executed once, but here it should be
executed twice?
maybe we need to use contain_volatile_functions to check through the
other part of the operator expression.

+ if (IsA(leftop, Const))
+ {
+ opno = get_commutator(opno);
+
+ if (!OidIsValid(opno))
+ {
+ /* Commuter doesn't exist, we can't reverse the order */
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ nconst_expr = get_rightop(orqual);
+ const_expr = get_leftop(orqual);
+ }
+ else if (IsA(rightop, Const))
+ {
+ const_expr = get_rightop(orqual);
+ nconst_expr = get_leftop(orqual);
+ }
+ else
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
do we need to skip this transformation for the const type is anyarray?




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-30 Thread Daniel Verite
vignesh C wrote:

> patching file src/interfaces/libpq/exports.txt
> Hunk #1 FAILED at 191.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/interfaces/libpq/exports.txt.rej
> 
> Please post an updated version for the same.

PFA a rebased version.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From 8cfb82b1e36e96996637948a231ae35b9af1e074 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Tue, 30 Jan 2024 14:38:21 +0100
Subject: [PATCH v6 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml   |  96 ++
 .../libpqwalreceiver/libpqwalreceiver.c   |   1 +
 src/bin/pg_amcheck/pg_amcheck.c   |   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-exec.c| 117 +++---
 src/interfaces/libpq/libpq-fe.h   |   4 +-
 src/interfaces/libpq/libpq-int.h  |   7 +-
 7 files changed, 184 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d0d5aefadc..f7f5a04df6 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5189,8 +5202,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5554,12 +5567,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before retrieving results with PQgetResult.
- This mode selection is effective only for the query currently
- being processed. For more information on the use of
- PQsetSingleRowMode,
- refer to .
+ To enter single-row or chunked modes, call
+ respectively PQsetSingleRowMode
+ or PQsetChunkedRowsMode before retrieving results
+ with PQgetResult.  This mode selection is effective
+ only for the query currently being processed. For more information on the
+ use of these functions refer
+ to .
 
 
 
@@ -5926,10 +5940,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
  
 
- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results by chunks
 
-  
+  
libpq
single-row mode
   
@@ -5940,13 +5954,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
PGresult.  This can be unworkable for commands
that return a large number of rows.  For such cases, applications can use
 and  
in
-   single-row mode.  In this mode, the result row(s) are
-   returned to the application one at a time, as they are received from the
-   server.
+   single-row mode or chunked 
mode.
+   In these modes, the result row(s) are returned to the application one at a
+   time for the single-row mode and by chunks for the chunked mode, as they
+   are received from the server.
   
 
   
-   To enter single-row mode, call 
+   To enter these modes, call 
+or 
immediately after a successful call of 
(or a sibling function).  This mode selection is effective only for the
currently executing query.  Then call 
@@ -5954,7 +5970,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
linkend="libpq-async"/>.  If the query returns any rows, they are returned
as individual PGresult objects, which look like
normal query results except for having status code
-   PGRES_SINGLE_TUPLE instead of
+   PGRES_SINGLE_TUPLE for the single-row mode and
+   PGRES_TUPLES_CHUNK for the chunked mode, instead of
PGRES_TUPLES_OK.  After the last row, or immediately if
the query returns zero rows, a zero-row object with status
PGRES_TUPLES_OK is returned; this is the signal that no
@@ -5967,9 +5984,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
 
   
-   When using pipeline mode, si

Re: scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Tom Lane
Alvaro Herrera  writes:
> I noticed while answering a question that commit b577743000cd added the
> GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
> the PQparameterStatus documentation.

Why is it GUC_REPORT at all?  I don't see a strong need for that.
(In particular, the report would be delivered too late to be of any
use in client authentication.)

regards, tom lane




Re: scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-30, Jelte Fennema-Nio wrote:

> On Tue, 30 Jan 2024 at 13:37, Alvaro Herrera  wrote:
> >
> > I noticed while answering a question that commit b577743000cd added the
> > GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
> > the PQparameterStatus documentation.
> 
> +1 the improvements your suggesting (although 3 I don't know enough
> about to be sure)
> 
> One important note though is that this list is tracked in two
> different places, so both of these places should be updated:
> - doc/src/sgml/protocol.sgml
> - doc/src/sgml/libpq.sgml

Ooh, you're right.  I propose to turn the list into a

which looks _much_ nicer to read, as in the attached screenshot of the
PDF.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
 really, I see PHP as like a strange amalgamation of C, Perl, Shell
 inflex: you know that "amalgam" means "mixture with mercury",
   more or less, right?
 i.e., "deadly poison"
>From 3f7009c6d37c890081bd9511d6a50cee17cee0e5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 30 Jan 2024 15:27:47 +0100
Subject: [PATCH v2] Update PQparameterStatus and ParameterStatus docs

Cover scram_iterations, which were missed in 16.

Also turn the list into a  with 2 columns, which is much
nicer to read.
---
 doc/src/sgml/libpq.sgml| 43 ++
 doc/src/sgml/protocol.sgml | 43 ++
 2 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d0d5aefadc..1d8998efb2 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2509,30 +2509,27 @@ const char *PQparameterStatus(const PGconn *conn, const char *paramName);
   
 
   
-   Parameters reported as of the current release include
-   server_version,
-   server_encoding,
-   client_encoding,
-   application_name,
-   default_transaction_read_only,
-   in_hot_standby,
-   is_superuser,
-   session_authorization,
-   DateStyle,
-   IntervalStyle,
-   TimeZone,
-   integer_datetimes, and
-   standard_conforming_strings.
-   (server_encoding, TimeZone, and
-   integer_datetimes were not reported by releases before 8.0;
-   standard_conforming_strings was not reported by releases
-   before 8.1;
-   IntervalStyle was not reported by releases before 8.4;
-   application_name was not reported by releases before
-   9.0;
-   default_transaction_read_only and
+   Parameters reported as of the current release include:
+   
+application_name
+client_encoding
+DateStyle
+default_transaction_read_only
+in_hot_standby
+integer_datetimes
+IntervalStyle
+is_superuser
+scram_iterations
+server_encoding
+server_version
+session_authorization
+standard_conforming_strings
+TimeZone
+   
+   (default_transaction_read_only and
in_hot_standby were not reported by releases before
-   14.)
+   14; scram_iterations was not reported by releases
+   before 16.)
Note that
server_version,
server_encoding and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index bb4fef1f51..ed1d62f5f8 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1314,30 +1314,27 @@ SELCT 1/0;
 

 At present there is a hard-wired set of parameters for which
-ParameterStatus will be generated: they are
-server_version,
-server_encoding,
-client_encoding,
-application_name,
-default_transaction_read_only,
-in_hot_standby,
-is_superuser,
-session_authorization,
-DateStyle,
-IntervalStyle,
-TimeZone,
-integer_datetimes, and
-standard_conforming_strings.
-(server_encoding, TimeZone, and
-integer_datetimes were not reported by releases before 8.0;
-standard_conforming_strings was not reported by releases
-before 8.1;
-IntervalStyle was not reported by releases before 8.4;
-application_name was not reported by releases before
-9.0;
-default_transaction_read_only and
+ParameterStatus will be generated.  They are:
+
+ application_name
+ client_encoding
+ DateStyle
+ default_transaction_read_only
+ in_hot_standby
+ integer_datetimes
+ IntervalStyle
+ is_superuser
+ scram_iterations
+ server_encoding
+ server_version
+ session_authorization
+ standard_conforming_strings
+ TimeZone
+
+(default_transaction_read_only and
 in_hot_standby were not reported by releases before
-14.)
+14; scram_iterations was not reported by releases
+before 16.)
 Note that
 server_version,
 server_encoding and
-- 
2.39.2



Re: [PATCH] Add native windows on arm64 support

2024-01-30 Thread Dave Cramer
On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan  wrote:

>
> On 2024-01-29 Mo 11:20, Dave Cramer wrote:
>
>
> Dave Cramer
> www.postgres.rocks
>
>
> On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan  wrote:
>
>>
>> On 2024-01-26 Fr 09:18, Dave Cramer wrote:
>>
>>
>>
>> On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan  wrote:
>>
>>>
>>> On 2024-01-25 Th 20:32, Michael Paquier wrote:
>>> > On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>>> >> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan 
>>> wrote:
>>> >>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> >>> Yeah, I think the default Developer Command Prompt for VS2022 is set
>>> up
>>> >>> for x86 builds. AIUI you should start by executing "vcvarsall
>>> x64_arm64".
>>> >> Yup, now I'm in the same state you are
>>> > Wait a minute here.  Based on [1], x64_arm64 means you can use a x64
>>> > host and you'll be able to produce ARM64 builds, still these will not
>>> > be able to run on the host where they were built.  How much of the
>>> > patch posted upthread is required to produce such builds?  Basically
>>> > everything from it, I guess, so as build dependencies can be
>>> > satisfied?
>>> >
>>> > [1]:
>>> https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
>>>
>>>
>>> If you look at the table here x86 and x64 are the only supported host
>>> architectures. But that's OK, the x64 binaries will run on arm64 (W11
>>> ARM64 has x64 emulation builtin). If that didn't work Dave and I would
>>> not have got as far as we have. But you want the x64_arm64 argument to
>>> vcvarsall so you will get ARM64 output.
>>>
>>
>> I've rebuilt it using  x64_arm64 and with the attached (very naive patch)
>> and I still get an x64 binary :(
>>
>>
>> With this patch I still get a build error, but it's different :-)
>>
>>
>> [1406/2088] "link" @src/backend/postgres.exe.rsp
>> FAILED: src/backend/postgres.exe src/backend/postgres.pdb
>> "link" @src/backend/postgres.exe.rsp
>>Creating library src\backend\postgres.exe.lib
>>
>> storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol
>> spin_delay referenced in function perform_spin_delay
>>
>> src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
>>
>
> Did you add the latest lock.patch ?
>
>
>
>
> I'm a bit confused about exactly what needs to be applied. Can you supply
> a complete patch to be applied to a pristine checkout that will let me
> build?
>
>
> cheers
>

See attached.

Dave


0001-fix-build-for-arm.patch
Description: Binary data


0002-naive-patch-to-fix-locking-for-arm64.patch
Description: Binary data


Re: scram_iterations is undocumented GUC_REPORT

2024-01-30 Thread Daniel Gustafsson
> On 30 Jan 2024, at 15:36, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>> I noticed while answering a question that commit b577743000cd added the
>> GUC scram_iterations and marked it GUC_REPORT, but failed to add it to
>> the PQparameterStatus documentation.
> 
> Why is it GUC_REPORT at all?  I don't see a strong need for that.
> (In particular, the report would be delivered too late to be of any
> use in client authentication.)

It's needed by PQencryptPasswordConn.

--
Daniel Gustafsson





Documentation: warn about two_phase when altering a subscription

2024-01-30 Thread Bertrand Drouvot
Hi hackers,

776621a5e4 added a "warning" in the documentation to alter a subscription (to
ensure the slot's failover property matches the subscription's one).

The same remark could be done for the two_phase option. This patch is an attempt
to do so.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 756886e59afddd09fa6f87ab95af7292ebca3e76 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 30 Jan 2024 14:48:16 +
Subject: [PATCH v1] Documentation: warn about two_phase when altering a
 subscription's slot name.

776621a5e4 added a warning about the newly failover option. Doing the same
for the already existing two_phase one.
---
 doc/src/sgml/ref/alter_subscription.sgml | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)
 100.0% doc/src/sgml/ref/

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index e9e6d9d74a..cd553f6312 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -235,15 +235,17 @@ ALTER SUBSCRIPTION name RENAME TO <
  
   When altering the
   slot_name,
-  the failover property value of the named slot may differ from the
+  the failover and two_phase properties
+  values of the named slot may differ from their counterparts
   failover
-  parameter specified in the subscription. When creating the slot,
-  ensure the slot failover property matches the
+  and two_phase
+  parameters specified in the subscription. When creating the slot, ensure
+  the slot failover and two_phase
+  properties match their counterparts parameters values of the subscription.
+  Otherwise, the slot on the publisher may behave differently from what subscription's
   failover
-  parameter value of the subscription. Otherwise, the slot on the
-  publisher may behave differently from what subscription's
-  failover
-  option says. The slot on the publisher could either be
+  and two_phase
+  options say: for example, the slot on the publisher could either be
   synced to the standbys even when the subscription's
   failover
   option is disabled or could be disabled for sync
-- 
2.34.1



Re: Bytea PL/Perl transform

2024-01-30 Thread Dagfinn Ilmari Mannsåker
Pavel Stehule  writes:

> I inserted perl reference support - hstore_plperl and json_plperl does it.
>
> +<->/* Dereference references recursively. */
> +<->while (SvROK(in))
> +<-><-->in = SvRV(in);

That code in hstore_plperl and json_plperl is only relevant because they
deal with non-scalar values (hashes for hstore, and also arrays for
json) which must be passed as references.  The recursive nature of the
dereferencing is questionable, and masked the bug fixed by commit
1731e3741cbbf8e0b4481665d7d523bc55117f63.

bytea_plperl only deals with scalars (specifically strings), so should
not concern itself with references.  In fact, this code breaks returning
objects with overloaded stringification, for example:

CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
  TRANSFORM FOR TYPE bytea
  AS $$
package StringOverload { use overload '""' => sub { "stuff" }; }
return bless {}, "StringOverload";
  $$;

This makes the server crash with an assertion failure from Perl because
SvPVbyte() was passed a non-scalar value:

postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865: 
Perl_sv_2pv_flags:
Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV && SvTYPE(sv) != 
SVt_PVFM' failed.

If I remove the dereferincing loop it succeeds:

SELECT encode(plperlu_overload(), 'escape') AS string;
 string

 stuff
(1 row)

Attached is a v2 patch which removes the dereferencing and includes the
above example as a test.

- ilmari

>From aabaf4f5932f59de2fed48bbba7339807a1f04bd Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Tue, 30 Jan 2024 10:31:00 +0100
Subject: [PATCH v2] Add bytea transformation for plperl

---
 contrib/Makefile  |  4 +-
 contrib/bytea_plperl/.gitignore   |  4 ++
 contrib/bytea_plperl/Makefile | 39 ++
 contrib/bytea_plperl/bytea_plperl--1.0.sql| 19 +++
 contrib/bytea_plperl/bytea_plperl.c   | 44 
 contrib/bytea_plperl/bytea_plperl.control |  7 +++
 contrib/bytea_plperl/bytea_plperlu--1.0.sql   | 19 +++
 contrib/bytea_plperl/bytea_plperlu.control|  6 +++
 .../bytea_plperl/expected/bytea_plperl.out| 49 ++
 .../bytea_plperl/expected/bytea_plperlu.out   | 49 ++
 contrib/bytea_plperl/meson.build  | 51 +++
 contrib/bytea_plperl/sql/bytea_plperl.sql | 31 +++
 contrib/bytea_plperl/sql/bytea_plperlu.sql| 31 +++
 contrib/meson.build   |  1 +
 doc/src/sgml/plperl.sgml  | 16 ++
 15 files changed, 368 insertions(+), 2 deletions(-)
 create mode 100644 contrib/bytea_plperl/.gitignore
 create mode 100644 contrib/bytea_plperl/Makefile
 create mode 100644 contrib/bytea_plperl/bytea_plperl--1.0.sql
 create mode 100644 contrib/bytea_plperl/bytea_plperl.c
 create mode 100644 contrib/bytea_plperl/bytea_plperl.control
 create mode 100644 contrib/bytea_plperl/bytea_plperlu--1.0.sql
 create mode 100644 contrib/bytea_plperl/bytea_plperlu.control
 create mode 100644 contrib/bytea_plperl/expected/bytea_plperl.out
 create mode 100644 contrib/bytea_plperl/expected/bytea_plperlu.out
 create mode 100644 contrib/bytea_plperl/meson.build
 create mode 100644 contrib/bytea_plperl/sql/bytea_plperl.sql
 create mode 100644 contrib/bytea_plperl/sql/bytea_plperlu.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..56c628df00 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -77,9 +77,9 @@ ALWAYS_SUBDIRS += sepgsql
 endif
 
 ifeq ($(with_perl),yes)
-SUBDIRS += bool_plperl hstore_plperl jsonb_plperl
+SUBDIRS += bool_plperl bytea_plperl hstore_plperl jsonb_plperl
 else
-ALWAYS_SUBDIRS += bool_plperl hstore_plperl jsonb_plperl
+ALWAYS_SUBDIRS += bool_plperl bytea_plperl hstore_plperl jsonb_plperl
 endif
 
 ifeq ($(with_python),yes)
diff --git a/contrib/bytea_plperl/.gitignore b/contrib/bytea_plperl/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/bytea_plperl/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/bytea_plperl/Makefile b/contrib/bytea_plperl/Makefile
new file mode 100644
index 00..1814d2f418
--- /dev/null
+++ b/contrib/bytea_plperl/Makefile
@@ -0,0 +1,39 @@
+# contrib/bytea_plperl/Makefile
+
+MODULE_big = bytea_plperl
+OBJS = \
+	$(WIN32RES) \
+	bytea_plperl.o
+PGFILEDESC = "bytea_plperl - bytea transform for plperl"
+
+PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl
+
+EXTENSION = bytea_plperlu bytea_plperl
+DATA = bytea_plperlu--1.0.sql bytea_plperl--1.0.sql
+
+REGRESS = bytea_plperl bytea_plperlu
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/bytea_plperl
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# We must link libperl explicitly
+ifeq ($(PORTNAME), win32)
+#

Re: Synchronizing slots from primary to standby

2024-01-30 Thread Bertrand Drouvot
Hi,

On Mon, Jan 29, 2024 at 09:15:57AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 02:35:52PM +0530, Amit Kapila wrote:
> > I think it is better to create a separate patch for two_phase after
> > this patch gets committed.
> 
> Yeah, makes sense, will do, thanks!

It's done in [1].

[1]: 
https://www.postgresql.org/message-id/ZbkYrLPhH%2BRxpZlW%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: cleanup patches for incremental backup

2024-01-30 Thread Robert Haas
On Mon, Jan 29, 2024 at 4:13 PM Nathan Bossart  wrote:
> On Mon, Jan 29, 2024 at 03:18:50PM -0500, Robert Haas wrote:
> > I'm wondering if what we need to do is run pg_walsummary on both
> > summary files in that case. If we just pick one or the other, how do
> > we know which one to pick?
>
> Even if we do that, isn't it possible that none of the summaries will
> include the change?  Presently, we get the latest summarized LSN, make a
> change, and then wait for the next summary file with a greater LSN than
> what we saw before the change.  But AFAICT there's no guarantee that means
> the change has been summarized yet, although the chances of that happening
> in a test are probably pretty small.
>
> Could we get the LSN before and after making the change and then inspect
> all summaries that include that LSN range?

The trick here is that each WAL summary file covers one checkpoint
cycle. The intent of the test is to load data into the table,
checkpoint, see what summaries exist, then update a row, checkpoint
again, and see what summaries now exist. We expect one new summary
because there's been one new checkpoint. When I was thinking about
this yesterday, I was imagining that we were somehow getting an extra
checkpoint in some cases. But it looks like it's actually an
off-by-one situation. In
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-01-29%2018%3A09%3A10
the new files that show up between "after insert" and "after new
summary" are:

00010152FAE0015AAAC8.summary (LSN distance ~500k)
00010152F7A80152FAE0.summary (LSN distance 824 bytes)

The checkpoint after the inserts says:

LOG:  checkpoint complete: wrote 14 buffers (10.9%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.956 s, sync=0.929 s, total=3.059
s; sync files=39, longest=0.373 s, average=0.024 s; distance=491 kB,
estimate=491 kB; lsn=0/15AAB20, redo lsn=0/15AAAC8

And the checkpoint after the single-row update says:

LOG:  checkpoint complete: wrote 4 buffers (3.1%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.648 s, sync=0.355 s, total=2.798
s; sync files=3, longest=0.348 s, average=0.119 s; distance=11 kB,
estimate=443 kB; lsn=0/15AD770, redo lsn=0/15AD718

So both of the new WAL summary files that are appearing here are from
checkpoints that happened before the single-row update. The larger
file is the one covering the 400 inserts, and the smaller one is the
checkpoint before that. Which means that the "Wait for a new summary
to show up." code isn't actually waiting long enough, and then the
whole thing goes haywire. The problem is, I think, that this code
naively thinks it can just wait for summarized_lsn and everything will
be fine ... but that assumes we were caught up when we first measured
the summarized_lsn, and that need not be so, because it takes some
short but non-zero amount of time for the summarizer to catch up with
the WAL generated during initdb.

I think the solution here is to find a better way to wait for the
inserts to be summarized, one that actually does wait for that to
happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Fix some ubsan/asan related issues

2024-01-30 Thread Tristan Partin

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, and there 
didn't seem to be anything preventing that in the LogLogicalMessage() 
codepath. Here is a preventative measure in LogLogicalMessage() and an 
Assert() in CopyXLogRecordToWAL().


Patch 2:

Support building with -Db_sanitize=address in Meson. Various executables 
are leaky which can cause the builds to fail and tests to fail, when we 
are fine leaking this memory.


Personally, I am a big stickler for always freeing my memory in 
executables even if it gets released at program exit because it makes 
the leak sanitizer much more effective. However (!), I am not here to 
change the philosophy of memory management in one-off executables, so 
I have just silenced memory leaks in various executables for now.


Patch 3:

THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!

In my effort to try to see if the test suite would pass with asan 
enabled, I ran into a max_stack_depth issue. I tried maxing it out 
(hence, the patch), but that still didn't remedy my issue. I tried to 
look on the list for any relevant emails, but nothing turned up. Maybe 
others have not attempted this before? Seems doubtful.


Not entirely sure how to fix this issue. I personally find asan 
extremely effective, even more than valgrind, so it would be great if 
I could run Postgres with asan enabled to catch various stupid C issues 
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres 
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't 
seem to leave enough stack space for Postgres.


---

I would like to see patch 1 reviewed and committed. Patch 2 honestly 
doesn't matter unless asan support can be fixed. I can also add a patch 
that errors out the Meson build if asan support is requested. That way 
others don't spend time heading down a dead end.


--
Tristan Partin
Neon (https://neon.tech)




Re: Fix some ubsan/asan related issues

2024-01-30 Thread Tristan Partin
Spend so much time writing out the email, I once again forget 
attachments...UGH.


--
Tristan Partin
Neon (https://neon.tech)
From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 29 Jan 2024 18:03:39 -0600
Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
 NULL

If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
the API contract of memcpy in glibc. The two pointer arguments are
marked as nonnull, even in the event the amount to copy is 0 bytes.
---
 src/backend/access/transam/xlog.c | 1 +
 src/backend/replication/logical/message.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..929888beb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1288,6 +1288,7 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 		}
 
 		Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
+		Assert(rdata_data != NULL);
 		memcpy(currpos, rdata_data, rdata_len);
 		currpos += rdata_len;
 		CurrPos += rdata_len;
diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c
index 2ac34e7781..126c57ef6e 100644
--- a/src/backend/replication/logical/message.c
+++ b/src/backend/replication/logical/message.c
@@ -67,7 +67,8 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size,
 	XLogBeginInsert();
 	XLogRegisterData((char *) &xlrec, SizeOfLogicalMessage);
 	XLogRegisterData(unconstify(char *, prefix), xlrec.prefix_size);
-	XLogRegisterData(unconstify(char *, message), size);
+	if (message)
+		XLogRegisterData(unconstify(char *, message), size);
 
 	/* allow origin filtering */
 	XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
-- 
Tristan Partin
Neon (https://neon.tech)

From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 24 Jan 2024 17:07:01 -0600
Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address

The ecpg is parser is extremely leaky, so we need to silence leak
detection.
---
 meson.build|  3 +++
 src/bin/initdb/initdb.c| 11 +++
 src/bin/pg_config/pg_config.c  | 10 ++
 src/bin/pg_resetwal/pg_resetwal.c  | 10 ++
 src/include/pg_config.h.in |  5 +
 src/interfaces/ecpg/preproc/ecpg.c | 11 +++
 6 files changed, 50 insertions(+)

diff --git a/meson.build b/meson.build
index 8ed51b6aae..d8c524d6f6 100644
--- a/meson.build
+++ b/meson.build
@@ -2530,6 +2530,9 @@ cdata.set_quoted('PG_VERSION_STR',
   )
 )
 
+if get_option('b_sanitize').contains('address')
+  cdata.set('USE_ADDRESS_SANITIZER', 1)
+endif
 
 ###
 # NLS / Gettext
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..e18e716d9c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -338,6 +338,17 @@ do { \
 		output_failed = true, output_errno = errno; \
 } while (0)
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
+
 /*
  * Escape single quotes and backslashes, suitably for insertions into
  * configuration files or SQL E'' strings.
diff --git a/src/bin/pg_config/pg_config.c b/src/bin/pg_config/pg_config.c
index 77d09ccfc4..26d0b2f62b 100644
--- a/src/bin/pg_config/pg_config.c
+++ b/src/bin/pg_config/pg_config.c
@@ -67,6 +67,16 @@ static const InfoItem info_items[] = {
 	{NULL, NULL}
 };
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
 
 static void
 help(void)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..54f1ce5e44 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -89,6 +89,16 @@ static void KillExistingWALSummaries(void);
 static void WriteEmptyXLOG(void);
 static void usage(void);
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
 
 int
 main(int argc, char *argv[])
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 07e73567dc..ce0c700b6d 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -668,6 +668,11 @@
 /* Define to 1 if strerror_r() returns int. */
 #undef STRERROR_R_INT
 
+/* Define to 1 if using the address sanitizer. Typically this can be detecte
+ * with __has_feature(address_sanitizer), but GCC doesn't support it with C99.
+ * Remove it when the standard is bumped. */
+#undef USE_ADDRESS_SANITIZER
+
 /* Define to 1 to use ARMv8 CRC Extension. */
 #undef USE_ARMV8_CRC32C
 
diff --git a/src

Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> napsal:

> Pavel Stehule  writes:
>
> > I inserted perl reference support - hstore_plperl and json_plperl does
> it.
> >
> > +<->/* Dereference references recursively. */
> > +<->while (SvROK(in))
> > +<-><-->in = SvRV(in);
>
> That code in hstore_plperl and json_plperl is only relevant because they
> deal with non-scalar values (hashes for hstore, and also arrays for
> json) which must be passed as references.  The recursive nature of the
> dereferencing is questionable, and masked the bug fixed by commit
> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
>
> bytea_plperl only deals with scalars (specifically strings), so should
> not concern itself with references.  In fact, this code breaks returning
> objects with overloaded stringification, for example:
>
> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
>   TRANSFORM FOR TYPE bytea
>   AS $$
> package StringOverload { use overload '""' => sub { "stuff" }; }
> return bless {}, "StringOverload";
>   $$;
>
> This makes the server crash with an assertion failure from Perl because
> SvPVbyte() was passed a non-scalar value:
>
> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
> Perl_sv_2pv_flags:
> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV && SvTYPE(sv)
> != SVt_PVFM' failed.
>
> If I remove the dereferincing loop it succeeds:
>
> SELECT encode(plperlu_overload(), 'escape') AS string;
>  string
> 
>  stuff
> (1 row)
>
> Attached is a v2 patch which removes the dereferencing and includes the
> above example as a test.
>

But without dereference it returns bad value.

Maybe there should be a check so references cannot be returned? Probably is
not safe pass pointers between Perl and Postgres.



>
> - ilmari
>
>


Re: Parallelize correlated subqueries that execute within each worker

2024-01-30 Thread Akshat Jaimini
I think we should move this patch to the next CF as I believe that work is 
still going on resolving the last reported bug.

Re: Bytea PL/Perl transform

2024-01-30 Thread Dagfinn Ilmari Mannsåker
Pavel Stehule  writes:

> út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> napsal:
>
>> Pavel Stehule  writes:
>>
>> > I inserted perl reference support - hstore_plperl and json_plperl does
>> it.
>> >
>> > +<->/* Dereference references recursively. */
>> > +<->while (SvROK(in))
>> > +<-><-->in = SvRV(in);
>>
>> That code in hstore_plperl and json_plperl is only relevant because they
>> deal with non-scalar values (hashes for hstore, and also arrays for
>> json) which must be passed as references.  The recursive nature of the
>> dereferencing is questionable, and masked the bug fixed by commit
>> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
>>
>> bytea_plperl only deals with scalars (specifically strings), so should
>> not concern itself with references.  In fact, this code breaks returning
>> objects with overloaded stringification, for example:
>>
>> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
>>   TRANSFORM FOR TYPE bytea
>>   AS $$
>> package StringOverload { use overload '""' => sub { "stuff" }; }
>> return bless {}, "StringOverload";
>>   $$;
>>
>> This makes the server crash with an assertion failure from Perl because
>> SvPVbyte() was passed a non-scalar value:
>>
>> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
>> Perl_sv_2pv_flags:
>> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV && SvTYPE(sv)
>> != SVt_PVFM' failed.
>>
>> If I remove the dereferincing loop it succeeds:
>>
>> SELECT encode(plperlu_overload(), 'escape') AS string;
>>  string
>> 
>>  stuff
>> (1 row)
>>
>> Attached is a v2 patch which removes the dereferencing and includes the
>> above example as a test.
>>
>
> But without dereference it returns bad value.

Where exactly does it return a bad value?  The existing tests pass, and
the one I included shows that it does the right thing in that case too.
If you pass it an unblessed reference it returns the stringified version
of that, as expected.

CREATE FUNCTION plperl_reference() RETURNS bytea LANGUAGE plperl
 TRANSFORM FOR TYPE bytea
 AS $$ return []; $$;

SELECT encode(plperl_reference(), 'escape') string;
string
---
 ARRAY(0x559a3109f0a8)
(1 row)

This would also crash if the dereferencing loop was left in place.

> Maybe there should be a check so references cannot be returned? Probably is
> not safe pass pointers between Perl and Postgres.

There's no reason to ban references, that would break every Perl
programmer's expectations.  And there are no pointers being passed,
SvPVbyte() returns the stringified form of whatever's passed in, which
is well-behaved for both blessed and unblessed references.

If we really want to be strict, we should at least allow references to
objects that overload stringification, as they are explicitly designed
to be well-behaved as strings.  But that would be a lot of extra code
for very little benefit over just letting Perl stringify everything.

- ilmari




Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> napsal:

> Pavel Stehule  writes:
>
> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
> > ilm...@ilmari.org> napsal:
> >
> >> Pavel Stehule  writes:
> >>
> >> > I inserted perl reference support - hstore_plperl and json_plperl does
> >> it.
> >> >
> >> > +<->/* Dereference references recursively. */
> >> > +<->while (SvROK(in))
> >> > +<-><-->in = SvRV(in);
> >>
> >> That code in hstore_plperl and json_plperl is only relevant because they
> >> deal with non-scalar values (hashes for hstore, and also arrays for
> >> json) which must be passed as references.  The recursive nature of the
> >> dereferencing is questionable, and masked the bug fixed by commit
> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
> >>
> >> bytea_plperl only deals with scalars (specifically strings), so should
> >> not concern itself with references.  In fact, this code breaks returning
> >> objects with overloaded stringification, for example:
> >>
> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
> >>   TRANSFORM FOR TYPE bytea
> >>   AS $$
> >> package StringOverload { use overload '""' => sub { "stuff" }; }
> >> return bless {}, "StringOverload";
> >>   $$;
> >>
> >> This makes the server crash with an assertion failure from Perl because
> >> SvPVbyte() was passed a non-scalar value:
> >>
> >> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
> >> Perl_sv_2pv_flags:
> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
> SvTYPE(sv)
> >> != SVt_PVFM' failed.
> >>
> >> If I remove the dereferincing loop it succeeds:
> >>
> >> SELECT encode(plperlu_overload(), 'escape') AS string;
> >>  string
> >> 
> >>  stuff
> >> (1 row)
> >>
> >> Attached is a v2 patch which removes the dereferencing and includes the
> >> above example as a test.
> >>
> >
> > But without dereference it returns bad value.
>
> Where exactly does it return a bad value?  The existing tests pass, and
> the one I included shows that it does the right thing in that case too.
> If you pass it an unblessed reference it returns the stringified version
> of that, as expected.
>

ugly test code

(2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
perl_inverse_bytes(bytea) RETURNS bytea
TRANSFORM FOR TYPE bytea
AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;
return $ref;
$$ LANGUAGE plperlu;
CREATE FUNCTION
(2024-01-30 13:44:33) postgres=# select perl_inverse_bytes(''), ' '::bytea;
┌──┬───┐
│  perl_inverse_bytes  │ bytea │
╞══╪═══╡
│ \x5343414c41522830783130656134333829 │ \x20  │
└──┴───┘
(1 row)

expected

(2024-01-30 13:46:58) postgres=# select perl_inverse_bytes(''), ' '::bytea;
┌┬───┐
│ perl_inverse_bytes │ bytea │
╞╪═══╡
│ \x0123 │ \x20  │
└┴───┘
(1 row)


>
> CREATE FUNCTION plperl_reference() RETURNS bytea LANGUAGE plperl
>  TRANSFORM FOR TYPE bytea
>  AS $$ return []; $$;
>
> SELECT encode(plperl_reference(), 'escape') string;
> string
> ---
>  ARRAY(0x559a3109f0a8)
> (1 row)
>
> This would also crash if the dereferencing loop was left in place.
>
> > Maybe there should be a check so references cannot be returned? Probably
> is
> > not safe pass pointers between Perl and Postgres.
>
> There's no reason to ban references, that would break every Perl
> programmer's expectations.  And there are no pointers being passed,
> SvPVbyte() returns the stringified form of whatever's passed in, which
> is well-behaved for both blessed and unblessed references.
>
> If we really want to be strict, we should at least allow references to
> objects that overload stringification, as they are explicitly designed
> to be well-behaved as strings.  But that would be a lot of extra code
> for very little benefit over just letting Perl stringify everything.
>
> - ilmari
>


Re: Extend pgbench partitioning to pgbench_history

2024-01-30 Thread Gabriele Bartolini
Hi Abhijit,

Thanks for your input. Please accept my updated patch.

Ciao,
Gabriele

On Tue, 16 Jan 2024 at 12:53, Abhijit Menon-Sen  wrote:

> At 2023-11-30 11:29:15 +0100, gabriele.bartol...@enterprisedb.com wrote:
> >
> > With the attached patch, I extend the partitioning capability to the
> > pgbench_history table too.
> >
> > I have been thinking of adding an option to control this, but I preferred
> > to ask in this list whether it really makes sense or not (I struggle
> indeed
> > to see use cases where accounts is partitioned and history is not).
>
> I don't have a strong opinion about this, but I also can't think of a
> reason to want to create partitions for pgbench_accounts but leave out
> pgbench_history.
>
> > From ba8f507b126a9c5bd22dd40bb8ce0c1f0c43ac59 Mon Sep 17 00:00:00 2001
> > From: Gabriele Bartolini 
> > Date: Thu, 30 Nov 2023 11:02:39 +0100
> > Subject: [PATCH] Include pgbench_history in partitioning method for
> pgbench
> >
> > In case partitioning, make sure that pgbench_history is also partitioned
> with
> > the same criteria.
>
> I think "If partitioning" or "If we're creating partitions" would read
> better here. Also, same criteria as what? Maybe you could just add "as
> pgbench_accounts" to the end.
>
> > diff --git a/doc/src/sgml/ref/pgbench.sgml
> b/doc/src/sgml/ref/pgbench.sgml
> > index 05d3f81619..4c02d2a61d 100644
> > --- a/doc/src/sgml/ref/pgbench.sgml
> > +++ b/doc/src/sgml/ref/pgbench.sgml
> > […]
> > @@ -378,9 +378,9 @@ pgbench 
> options  d
> >
> --partitions=NUM
> >
> > 
> > -Create a partitioned pgbench_accounts table
> with
> > -NUM partitions of nearly equal size
> for
> > -the scaled number of accounts.
> > +Create partitioned pgbench_accounts and
> pgbench_history
> > +tables with NUM partitions of nearly
> equal size for
> > +the scaled number of accounts - and future history records.
> >  Default is 0, meaning no partitioning.
> > 
>
> I would just leave out the "-" and write "number of accounts and future
> history records".
>
> > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
> > index 2e1650d0ad..87adaf4d8f 100644
> > --- a/src/bin/pgbench/pgbench.c
> > +++ b/src/bin/pgbench/pgbench.c
> > […]
> > @@ -889,8 +889,10 @@ usage(void)
> >  "  --index-tablespace=TABLESPACE\n"
> >  "   create indexes in the
> specified tablespace\n"
> >  "  --partition-method=(range|hash)\n"
> > -"   partition pgbench_accounts
> with this method (default: range)\n"
> > -"  --partitions=NUM partition pgbench_accounts
> into NUM parts (default: 0)\n"
> > +"   partition pgbench_accounts
> and pgbench_history with this method"
> > +"   (default: range)."
> > +"  --partitions=NUM partition pgbench_accounts
> and pgbench_history into NUM parts"
> > +"   (default: 0)\n"
> >  "  --tablespace=TABLESPACE  create tables in the
> specified tablespace\n"
> >  "  --unlogged-tablescreate tables as unlogged
> tables\n"
> >  "\nOptions to select what to run:\n"
>
> There's a missing newline after "(default: range).".
>
> I read through the rest of the patch closely. It looks fine to me. It
> applies, builds, and does create the partitions as intended.
>
> -- Abhijit
>


-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


v2-0001-Include-pgbench_history-in-partitioning-method-fo.patch
Description: Binary data


Re: Bytea PL/Perl transform

2024-01-30 Thread Dagfinn Ilmari Mannsåker
Pavel Stehule  writes:

> út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> napsal:
>
>> Pavel Stehule  writes:
>>
>> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
>> > ilm...@ilmari.org> napsal:
>> >
>> >> Pavel Stehule  writes:
>> >>
>> >> > I inserted perl reference support - hstore_plperl and json_plperl does
>> >> it.
>> >> >
>> >> > +<->/* Dereference references recursively. */
>> >> > +<->while (SvROK(in))
>> >> > +<-><-->in = SvRV(in);
>> >>
>> >> That code in hstore_plperl and json_plperl is only relevant because they
>> >> deal with non-scalar values (hashes for hstore, and also arrays for
>> >> json) which must be passed as references.  The recursive nature of the
>> >> dereferencing is questionable, and masked the bug fixed by commit
>> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
>> >>
>> >> bytea_plperl only deals with scalars (specifically strings), so should
>> >> not concern itself with references.  In fact, this code breaks returning
>> >> objects with overloaded stringification, for example:
>> >>
>> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
>> >>   TRANSFORM FOR TYPE bytea
>> >>   AS $$
>> >> package StringOverload { use overload '""' => sub { "stuff" }; }
>> >> return bless {}, "StringOverload";
>> >>   $$;
>> >>
>> >> This makes the server crash with an assertion failure from Perl because
>> >> SvPVbyte() was passed a non-scalar value:
>> >>
>> >> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
>> >> Perl_sv_2pv_flags:
>> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
>> SvTYPE(sv)
>> >> != SVt_PVFM' failed.
>> >>
>> >> If I remove the dereferincing loop it succeeds:
>> >>
>> >> SELECT encode(plperlu_overload(), 'escape') AS string;
>> >>  string
>> >> 
>> >>  stuff
>> >> (1 row)
>> >>
>> >> Attached is a v2 patch which removes the dereferencing and includes the
>> >> above example as a test.
>> >>
>> >
>> > But without dereference it returns bad value.
>>
>> Where exactly does it return a bad value?  The existing tests pass, and
>> the one I included shows that it does the right thing in that case too.
>> If you pass it an unblessed reference it returns the stringified version
>> of that, as expected.
>>
>
> ugly test code
>
> (2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
> perl_inverse_bytes(bytea) RETURNS bytea
> TRANSFORM FOR TYPE bytea
> AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;

You are returning a reference, not a string.

> return $ref;
> $$ LANGUAGE plperlu;
> CREATE FUNCTION
> (2024-01-30 13:44:33) postgres=# select perl_inverse_bytes(''), ' '::bytea;
> ┌──┬───┐
> │  perl_inverse_bytes  │ bytea │
> ╞══╪═══╡
> │ \x5343414c41522830783130656134333829 │ \x20  │
> └──┴───┘
> (1 row)

~=# select encode('\x5343414c41522830783130656134333829', 'escape');
┌───┐
│  encode   │
├───┤
│ SCALAR(0x10ea438) │
└───┘

This is how Perl stringifies references in the absence of overloading.
Return the byte string directly from your function and it will do the
right thing:

CREATE FUNCTION plperlu_bytes() RETURNS bytea LANGUAGE plperlu
 TRANSFORM FOR TYPE bytea
 AS $$ return pack  'H*', '0123'; $$;

SELECT plperlu_bytes();
 plperlu_bytes
---
 \x0123
(1 row)


- ilmari




Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 17:46 odesílatel Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> napsal:

> Pavel Stehule  writes:
>
> > út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
> > ilm...@ilmari.org> napsal:
> >
> >> Pavel Stehule  writes:
> >>
> >> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
> >> > ilm...@ilmari.org> napsal:
> >> >
> >> >> Pavel Stehule  writes:
> >> >>
> >> >> > I inserted perl reference support - hstore_plperl and json_plperl
> does
> >> >> it.
> >> >> >
> >> >> > +<->/* Dereference references recursively. */
> >> >> > +<->while (SvROK(in))
> >> >> > +<-><-->in = SvRV(in);
> >> >>
> >> >> That code in hstore_plperl and json_plperl is only relevant because
> they
> >> >> deal with non-scalar values (hashes for hstore, and also arrays for
> >> >> json) which must be passed as references.  The recursive nature of
> the
> >> >> dereferencing is questionable, and masked the bug fixed by commit
> >> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
> >> >>
> >> >> bytea_plperl only deals with scalars (specifically strings), so
> should
> >> >> not concern itself with references.  In fact, this code breaks
> returning
> >> >> objects with overloaded stringification, for example:
> >> >>
> >> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
> >> >>   TRANSFORM FOR TYPE bytea
> >> >>   AS $$
> >> >> package StringOverload { use overload '""' => sub { "stuff" }; }
> >> >> return bless {}, "StringOverload";
> >> >>   $$;
> >> >>
> >> >> This makes the server crash with an assertion failure from Perl
> because
> >> >> SvPVbyte() was passed a non-scalar value:
> >> >>
> >> >> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
> >> >> Perl_sv_2pv_flags:
> >> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
> >> SvTYPE(sv)
> >> >> != SVt_PVFM' failed.
> >> >>
> >> >> If I remove the dereferincing loop it succeeds:
> >> >>
> >> >> SELECT encode(plperlu_overload(), 'escape') AS string;
> >> >>  string
> >> >> 
> >> >>  stuff
> >> >> (1 row)
> >> >>
> >> >> Attached is a v2 patch which removes the dereferencing and includes
> the
> >> >> above example as a test.
> >> >>
> >> >
> >> > But without dereference it returns bad value.
> >>
> >> Where exactly does it return a bad value?  The existing tests pass, and
> >> the one I included shows that it does the right thing in that case too.
> >> If you pass it an unblessed reference it returns the stringified version
> >> of that, as expected.
> >>
> >
> > ugly test code
> >
> > (2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
> > perl_inverse_bytes(bytea) RETURNS bytea
> > TRANSFORM FOR TYPE bytea
> > AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;
>
> You are returning a reference, not a string.
>

I know, but for this case, should not be raised an error?


>
> > return $ref;
> > $$ LANGUAGE plperlu;
> > CREATE FUNCTION
> > (2024-01-30 13:44:33) postgres=# select perl_inverse_bytes(''), '
> '::bytea;
> > ┌──┬───┐
> > │  perl_inverse_bytes  │ bytea │
> > ╞══╪═══╡
> > │ \x5343414c41522830783130656134333829 │ \x20  │
> > └──┴───┘
> > (1 row)
>
> ~=# select encode('\x5343414c41522830783130656134333829', 'escape');
> ┌───┐
> │  encode   │
> ├───┤
> │ SCALAR(0x10ea438) │
> └───┘
>
> This is how Perl stringifies references in the absence of overloading.
> Return the byte string directly from your function and it will do the
> right thing:
>
> CREATE FUNCTION plperlu_bytes() RETURNS bytea LANGUAGE plperlu
>  TRANSFORM FOR TYPE bytea
>  AS $$ return pack  'H*', '0123'; $$;
>
> SELECT plperlu_bytes();
>  plperlu_bytes
> ---
>  \x0123
> (1 row)
>
>
> - ilmari
>


Re: cleanup patches for incremental backup

2024-01-30 Thread Robert Haas
On Tue, Jan 30, 2024 at 10:51 AM Robert Haas  wrote:
> I think the solution here is to find a better way to wait for the
> inserts to be summarized, one that actually does wait for that to
> happen.

Here's a patch for that. I now think
a7097ca630a25dd2248229f21ebce4968d85d10a was actually misguided, and
served only to mask some of the failures caused by waiting for the WAL
summary file.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Revise-pg_walsummary-s-002_blocks-test-to-avoid-s.patch
Description: Binary data


Re: Parallelize correlated subqueries that execute within each worker

2024-01-30 Thread Robert Haas
On Tue, Jan 30, 2024 at 11:17 AM Akshat Jaimini  wrote:
> I think we should move this patch to the next CF as I believe that work is 
> still going on resolving the last reported bug.

We shouldn't just keep pushing this forward to the next CF. It's been
idle since July. If it needs more work, mark it RwF and it can be
reopened when there's something for a reviewer to do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Gabriele Bartolini
Hi,

I am sending an updated patch, and submitting this to the next commit fest,
as I still believe this could be very useful.

Thanks,
Gabriele

On Thu, 7 Sept 2023 at 21:51, Gabriele Bartolini <
gabriele.bartol...@enterprisedb.com> wrote:

> Hi everyone,
>
> I would like to propose a patch that allows administrators to disable
> `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server
> process at startup (e.g. `--disable-alter-system=true`, false by default)
> or a new GUC (or even both), without changing the current default method of
> the server.
>
> The main reason is that this would help improve the “security by default”
> posture of Postgres in a Kubernetes/Cloud Native environment - and, in
> general, in any environment on VMs/bare metal behind a configuration
> management system in which changes should only be made in a declarative way
> and versioned like Ansible Tower, to cite one.
>
> Below you find some background information and the longer story behind
> this proposal.
>
> Sticking to the Kubernetes use case, I am primarily speaking on behalf of
> the CloudNativePG open source operator (cloudnative-pg.io, of which I am
> one of the maintainers). However, I am sure that this option could benefit
> any operator for Postgres - an operator is the most common and recommended
> way to run a complex application like a PostgreSQL database management
> system inside Kubernetes.
>
> In this case, the state of a PostgreSQL cluster (for example its number of
> replicas, configuration, storage, etc.) is defined in a Custom Resource
> Definition in the form of configuration, typically YAML, and the operator
> works with Kubernetes to ensure that, at any moment, the requested Postgres
> cluster matches the observed one. This is a very basic example in
> CloudNativePG:
> https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml
>
> As I was mentioning above, in a Cloud Native environment it is expected
> that workloads are secure by default. Without going into much detail, many
> decisions have been made in that direction by operators for Postgres,
> including CloudNativePG. The goal of this proposal is to provide a way to
> ensure that changes to the PostgreSQL configuration in a Kubernetes
> controlled Postgres cluster are allowed only through the Kubernetes API.
>
> Basically, if you want to change an option for PostgreSQL, you need to
> change the desired state in the Kubernetes resource, then Kubernetes will
> converge (through the operator). In simple words, it’s like empowering the
> operator to impersonate the PostgreSQL superuser.
>
> However, given that we cannot force this use case, there could be roles
> with the login+superuser privileges connecting to the PostgreSQL instance
> and potentially “interfering” with the requested state defined in the
> configuration by imperatively running “ALTER SYSTEM” commands.
>
> For example: CloudNativePG has a fixed value for some GUCs in order to
> manage a full HA cluster, including SSL, log, some WAL and replication
> settings. While the operator eventually reconciles those settings, even the
> temporary change of those settings in a cluster might be harmful. Think for
> example of a user that, through `ALTER SYSTEM`, tries to change WAL level
> to minimal, or change the setting of the log (we require CSV), potentially
> creating issues to the underlying instance and cluster (potentially leaving
> it in an unrecoverable state in the case of other more invasive GUCS).
>
> At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked
> by making the postgresql.auto.conf read only, but the returned message is
> misleading and that’s certainly bad user experience (which is very
> important in a cloud native environment):
>
> ```
> postgres=# ALTER SYSTEM SET wal_level TO minimal;
> ERROR:  could not open file "postgresql.auto.conf": Permission denied
> ```
>
> For this reason, I would like to propose the option to be given to the
> postgres process at startup, in order to be as less invasive as possible
> (the operator could then start Postgres requesting `ALTER SYSTEM` to be
> disabled). That’d be my preference at the moment, if possible.
>
> Alternatively, or in addition, the introduction of a GUC to disable `ALTER
> SYSTEM` altogether. This enables tuning this setting through configuration
> at the Kubernetes level, only if the operators require it - without
> damaging the rest of the users.
>
> Before I start writing any lines of code and propose a patch, I would like
> first to understand if there’s room for it.
>
> Thanks for your attention and … looking forward to your feedback!
>
> Ciao,
> Gabriele
> --
> Gabriele Bartolini
> Vice President, Cloud Native at EDB
> enterprisedb.com
>


-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


0001-Add-enable_alter_system-GUC.patch
Description: Binary data


Re: separating use of SerialSLRULock

2024-01-30 Thread Alvaro Herrera
On 2024-Jan-29, Alvaro Herrera wrote:

> It's terrifying that SerialAdd() doesn't seem to be covered by any
> tests, though.

I realized that there's some coverage when compiling with
TEST_SUMMARIZE_SERIAL, so I tried that and it looks OK.

One other change I made was in the comment that explains the locking
order.  I had put the new lock at the top, but when I tested adding some
asserts to verify that the other locks are not held, they turn out to
fire soon enough ... and the conflicting lock is the last one of that
list.  So I added the new lock below it, and the SLRU lock further down,
because SerialAdd does it that way.

I pushed it now.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Bytea PL/Perl transform

2024-01-30 Thread Dagfinn Ilmari Mannsåker
Pavel Stehule  writes:

> út 30. 1. 2024 v 17:46 odesílatel Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> napsal:
>
>> Pavel Stehule  writes:
>>
>> > út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
>> > ilm...@ilmari.org> napsal:
>> >
>> >> Pavel Stehule  writes:
>> >>
>> >> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
>> >> > ilm...@ilmari.org> napsal:
>> >> >
>> >> >> Pavel Stehule  writes:
>> >> >>
>> >> >> > I inserted perl reference support - hstore_plperl and json_plperl
>> does
>> >> >> it.
>> >> >> >
>> >> >> > +<->/* Dereference references recursively. */
>> >> >> > +<->while (SvROK(in))
>> >> >> > +<-><-->in = SvRV(in);
>> >> >>
>> >> >> That code in hstore_plperl and json_plperl is only relevant because
>> they
>> >> >> deal with non-scalar values (hashes for hstore, and also arrays for
>> >> >> json) which must be passed as references.  The recursive nature of
>> the
>> >> >> dereferencing is questionable, and masked the bug fixed by commit
>> >> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
>> >> >>
>> >> >> bytea_plperl only deals with scalars (specifically strings), so
>> should
>> >> >> not concern itself with references.  In fact, this code breaks
>> returning
>> >> >> objects with overloaded stringification, for example:
>> >> >>
>> >> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
>> >> >>   TRANSFORM FOR TYPE bytea
>> >> >>   AS $$
>> >> >> package StringOverload { use overload '""' => sub { "stuff" }; }
>> >> >> return bless {}, "StringOverload";
>> >> >>   $$;
>> >> >>
>> >> >> This makes the server crash with an assertion failure from Perl
>> because
>> >> >> SvPVbyte() was passed a non-scalar value:
>> >> >>
>> >> >> postgres: ilmari regression_bytea_plperl [local] SELECT: sv.c:2865:
>> >> >> Perl_sv_2pv_flags:
>> >> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
>> >> SvTYPE(sv)
>> >> >> != SVt_PVFM' failed.
>> >> >>
>> >> >> If I remove the dereferincing loop it succeeds:
>> >> >>
>> >> >> SELECT encode(plperlu_overload(), 'escape') AS string;
>> >> >>  string
>> >> >> 
>> >> >>  stuff
>> >> >> (1 row)
>> >> >>
>> >> >> Attached is a v2 patch which removes the dereferencing and includes
>> the
>> >> >> above example as a test.
>> >> >>
>> >> >
>> >> > But without dereference it returns bad value.
>> >>
>> >> Where exactly does it return a bad value?  The existing tests pass, and
>> >> the one I included shows that it does the right thing in that case too.
>> >> If you pass it an unblessed reference it returns the stringified version
>> >> of that, as expected.
>> >>
>> >
>> > ugly test code
>> >
>> > (2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
>> > perl_inverse_bytes(bytea) RETURNS bytea
>> > TRANSFORM FOR TYPE bytea
>> > AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;
>>
>> You are returning a reference, not a string.
>>
>
> I know, but for this case, should not be raised an error?

I don't think so, as I explained in my previous reply:

> There's no reason to ban references, that would break every Perl
> programmer's expectations.

To elaborate on this: when a function is defined to return a string
(which bytea effectively is, as far as Perl is converned), I as a Perl
programmer would expect PL/Perl to just stringify whatever value I
returned, according to the usual Perl rules.

I also said:

> If we really want to be strict, we should at least allow references to
> objects that overload stringification, as they are explicitly designed
> to be well-behaved as strings.  But that would be a lot of extra code
> for very little benefit over just letting Perl stringify everything.

By "a lot of code", I mean everything `string_amg`-related in the
amagic_applies() function
(https://github.com/Perl/perl5/blob/v5.38.0/gv.c#L3401-L3545).  We can't
just call it: it's only available since Perl 5.38 (released last year),
and we support Perl versions all the way back to 5.14.

- ilmari




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-01-30 Thread Alvaro Herrera
Hmm, this looks quite nice and simple.  My only comment is that a
sequence like this

   /* Read from WAL buffers, if available. */
   rbytes = XLogReadFromBuffers(&output_message.data[output_message.len],
startptr, nbytes, xlogreader->seg.ws_tli);
   output_message.len += rbytes;
   startptr += rbytes;
   nbytes -= rbytes;

   if (!WALRead(xlogreader,
&output_message.data[output_message.len],
startptr,

leaves you wondering if WALRead() should be called at all or not, in the
case when all bytes were read by XLogReadFromBuffers.  I think in many
cases what's going to happen is that nbytes is going to be zero, and
then WALRead is going to return having done nothing in its inner loop.
I think this warrants a comment somewhere.  Alternatively, we could
short-circuit the 'if' expression so that WALRead() is not called in
that case (but I'm not sure it's worth the loss of code clarity).

Also, but this is really quite minor, it seems sad to add more functions
with the prefix XLog, when we have renamed things to use the prefix WAL,
and we have kept the old names only to avoid backpatchability issues.
I mean, if we have WALRead() already, wouldn't it make perfect sense to
name the new routine WALReadFromBuffers?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)




Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 18:26 odesílatel Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> napsal:

> Pavel Stehule  writes:
>
> > út 30. 1. 2024 v 17:46 odesílatel Dagfinn Ilmari Mannsåker <
> > ilm...@ilmari.org> napsal:
> >
> >> Pavel Stehule  writes:
> >>
> >> > út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
> >> > ilm...@ilmari.org> napsal:
> >> >
> >> >> Pavel Stehule  writes:
> >> >>
> >> >> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
> >> >> > ilm...@ilmari.org> napsal:
> >> >> >
> >> >> >> Pavel Stehule  writes:
> >> >> >>
> >> >> >> > I inserted perl reference support - hstore_plperl and
> json_plperl
> >> does
> >> >> >> it.
> >> >> >> >
> >> >> >> > +<->/* Dereference references recursively. */
> >> >> >> > +<->while (SvROK(in))
> >> >> >> > +<-><-->in = SvRV(in);
> >> >> >>
> >> >> >> That code in hstore_plperl and json_plperl is only relevant
> because
> >> they
> >> >> >> deal with non-scalar values (hashes for hstore, and also arrays
> for
> >> >> >> json) which must be passed as references.  The recursive nature of
> >> the
> >> >> >> dereferencing is questionable, and masked the bug fixed by commit
> >> >> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
> >> >> >>
> >> >> >> bytea_plperl only deals with scalars (specifically strings), so
> >> should
> >> >> >> not concern itself with references.  In fact, this code breaks
> >> returning
> >> >> >> objects with overloaded stringification, for example:
> >> >> >>
> >> >> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
> >> >> >>   TRANSFORM FOR TYPE bytea
> >> >> >>   AS $$
> >> >> >> package StringOverload { use overload '""' => sub { "stuff"
> }; }
> >> >> >> return bless {}, "StringOverload";
> >> >> >>   $$;
> >> >> >>
> >> >> >> This makes the server crash with an assertion failure from Perl
> >> because
> >> >> >> SvPVbyte() was passed a non-scalar value:
> >> >> >>
> >> >> >> postgres: ilmari regression_bytea_plperl [local] SELECT:
> sv.c:2865:
> >> >> >> Perl_sv_2pv_flags:
> >> >> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
> >> >> SvTYPE(sv)
> >> >> >> != SVt_PVFM' failed.
> >> >> >>
> >> >> >> If I remove the dereferincing loop it succeeds:
> >> >> >>
> >> >> >> SELECT encode(plperlu_overload(), 'escape') AS string;
> >> >> >>  string
> >> >> >> 
> >> >> >>  stuff
> >> >> >> (1 row)
> >> >> >>
> >> >> >> Attached is a v2 patch which removes the dereferencing and
> includes
> >> the
> >> >> >> above example as a test.
> >> >> >>
> >> >> >
> >> >> > But without dereference it returns bad value.
> >> >>
> >> >> Where exactly does it return a bad value?  The existing tests pass,
> and
> >> >> the one I included shows that it does the right thing in that case
> too.
> >> >> If you pass it an unblessed reference it returns the stringified
> version
> >> >> of that, as expected.
> >> >>
> >> >
> >> > ugly test code
> >> >
> >> > (2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
> >> > perl_inverse_bytes(bytea) RETURNS bytea
> >> > TRANSFORM FOR TYPE bytea
> >> > AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;
> >>
> >> You are returning a reference, not a string.
> >>
> >
> > I know, but for this case, should not be raised an error?
>
> I don't think so, as I explained in my previous reply:
>
> > There's no reason to ban references, that would break every Perl
> > programmer's expectations.
>
> To elaborate on this: when a function is defined to return a string
> (which bytea effectively is, as far as Perl is converned), I as a Perl
> programmer would expect PL/Perl to just stringify whatever value I
> returned, according to the usual Perl rules.
>

ok

Pavel


>
> I also said:
>
> > If we really want to be strict, we should at least allow references to
> > objects that overload stringification, as they are explicitly designed
> > to be well-behaved as strings.  But that would be a lot of extra code
> > for very little benefit over just letting Perl stringify everything.
>

> By "a lot of code", I mean everything `string_amg`-related in the
> amagic_applies() function
> (https://github.com/Perl/perl5/blob/v5.38.0/gv.c#L3401-L3545).  We can't
> just call it: it's only available since Perl 5.38 (released last year),
> and we support Perl versions all the way back to 5.14.
>
> - ilmari
>


Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Melih Mutlu
Hi Heikki,

Heikki Linnakangas , 29 Oca 2024 Pzt, 19:12 tarihinde şunu
yazdı:

> > Proposed change modifies socket_putmessage to send any data larger than
> > 8K immediately without copying it into the send buffer. Assuming that
> > the send buffer would be flushed anyway due to reaching its limit, the
> > patch just gets rid of the copy part which seems unnecessary and sends
> > data without waiting.
>
> If there's already some data in PqSendBuffer, I wonder if it would be
> better to fill it up with data, flush it, and then send the rest of the
> data directly. Instead of flushing the partial data first. I'm afraid
> that you'll make a tiny call to secure_write(), followed by a large one,
> then a tine one again, and so forth. Especially when socket_putmessage
> itself writes the msgtype and len, which are tiny, before the payload.
>

I agree that I could do better there without flushing twice for both
PqSendBuffer and
input data. PqSendBuffer always has some data, even if it's tiny, since
msgtype and len are added.


> Perhaps we should invent a new pq_putmessage() function that would take
> an input buffer with 5 bytes of space reserved before the payload.
> pq_putmessage() could then fill in the msgtype and len bytes in the
> input buffer and send that directly. (Not wedded to that particular API,
> but something that would have the same effect)
>

I thought about doing this. The reason why I didn't was because I think
that such a change would require adjusting all input buffers wherever
pq_putmessage is called, and I did not want to touch that many different
places. These places where we need pq_putmessage might not be that many
though, I'm not sure.


>
> > This change affects places where pq_putmessage is used such as
> > pg_basebackup, COPY TO, walsender etc.
> >
> > I did some experiments to see how the patch performs.
> > Firstly, I loaded ~5GB data into a table [1], then ran "COPY test TO
> > STDOUT". Here are perf results of both the patch and HEAD > ...
> > The patch brings a ~5% gain in socket_putmessage.
> >
> > [1]
> > CREATE TABLE test(id int, name text, time TIMESTAMP);
> > INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 100)
> > AS name, NOW() AS time FROM generate_series(1, 1) AS i;
>
> I'm surprised by these results, because each row in that table is < 600
> bytes. PqSendBufferSize is 8kB, so the optimization shouldn't kick in in
> that test. Am I missing something?
>

You're absolutely right. I made a silly mistake there. I also think that
the way I did perf analysis does not make much sense, even if one row of
the table is greater than 8kB.
Here are some quick timing results after being sure that it triggers this
patch's optimization. I need to think more on how to profile this with
perf. I hope to share proper results soon.

I just added a bit more zeros [1] and ran [2] (hopefully measured the
correct thing)

HEAD:
real2m48,938s
user0m9,226s
sys 1m35,342s

Patch:
real2m40,690s
user0m8,492s
sys 1m31,001s

[1]
 INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 1)
AS name, NOW() AS time FROM generate_series(1, 100) AS i;

[2]
 rm /tmp/dummy && echo 3 | sudo tee /proc/sys/vm/drop_caches && time psql
-d postgres -c "COPY test TO STDOUT;" > /tmp/dummy

Thanks,
-- 
Melih Mutlu
Microsoft


Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Robert Haas
On Tue, Sep 12, 2023 at 10:39 AM Martín Marqués
 wrote:
> The outcome looked for is that the system GUCs that require a restart
> or reload are not modified unless it's through some orchestration or
> someone with physical access to the configuration files (yeah, we
> still have the COPY PROGRAM).

If I understand this correctly, you're saying it's not a security
vulnerability if someone finds a way to use COPY PROGRAM or some other
mechanism to bypass the ALTER SYSTEM restriction, because the point of
the constraint isn't to make it impossible for the superuser to modify
the configuration in a way that they shouldn't, but rather to make it
inconvenient for them to do so.

I have to admit that I'm a little afraid that people will mistake this
for an actual security feature and file bug reports or CVEs about the
superuser being able to circumvent these restrictions. If we add this,
we had better make sure that the documentation is extremely clear
about what we are guaranteeing, or more to the point about what we are
not guaranteeing.

I understand that there's some frustration on the part of Gabriele and
others that this proposal hasn't been enthusiastically adopted, but I
would ask for a little bit of forbearance because those are also, by
and large, not the people who will not have to cope with it when we
start getting security researchers threatening to publish our evilness
in the Register. Such conversations are no fun at all. Explaining that
we're not actually evil doesn't tend to work, because the security
researchers are just as convinced that they are right as anyone
arguing for this feature is. Statements like "we don't actually intend
to guarantee X" tend to fall on deaf ears.

In fact, I would go so far as to argue that many of our security
problems (and non-problems) are widely misunderstood even within our
own community, and that far from being something anyone should dismiss
as pedantry, it's actually a critical issue for the project to solve
and something we really need to address in order to be able to move
forward. From that point of view, this feature seems bound to make an
already-annoying problem worse. I don't necessarily expect the people
who are in favor of this feature to accept that as a reason not to do
this, but I do hope to be taken seriously when I say there's a real
issue there. Something can be a serious problem even if it's not YOUR
problem, and in this case, that apparently goes both ways.

I also think that using the GUC system to manage itself is a little
bit suspect. I wonder if it would be better to do this some other way,
e.g. a sentinel file in the data directory. For example, suppose we
refuse ALTER SYSTEM if $PGDATA/disable_alter_system exists, or
something like that. It seems like it would be very easy for an
external management solution (k8s or whatever) to drop that file in
place if desired, and then it would be crystal clear that there's no
way of bypassing the restriction from within the GUC system itself
(though you could still bypass it via filesystem access).

I agree with those who have said that this shouldn't disable
postgresql.auto.conf, but only the ability of ALTER SYSTEM to modify
it. Right now, third-party tools external to the server can count on
being able to add things to postgresql.auto.conf with the reasonable
expectations that they'll take effect. I'd rather not break that
property.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Melih Mutlu
Hi Robert,

Robert Haas , 29 Oca 2024 Pzt, 20:48 tarihinde şunu
yazdı:

> > If there's already some data in PqSendBuffer, I wonder if it would be
> > better to fill it up with data, flush it, and then send the rest of the
> > data directly. Instead of flushing the partial data first. I'm afraid
> > that you'll make a tiny call to secure_write(), followed by a large one,
> > then a tine one again, and so forth. Especially when socket_putmessage
> > itself writes the msgtype and len, which are tiny, before the payload.
> >
> > Perhaps we should invent a new pq_putmessage() function that would take
> > an input buffer with 5 bytes of space reserved before the payload.
> > pq_putmessage() could then fill in the msgtype and len bytes in the
> > input buffer and send that directly. (Not wedded to that particular API,
> > but something that would have the same effect)
>
> I share the concern; I'm not sure about the best solution. I wonder if
> it would be useful to have pq_putmessagev() in the style of writev()
> et al. Or maybe what we need is secure_writev().
>

I thought about using writev() for not only pq_putmessage() but
pq_putmessage_noblock() too. Currently, pq_putmessage_noblock()
repallocs PqSendBuffer
and copies input buffer, which can easily be larger than 8kB, into
PqSendBuffer.I
also discussed it with Thomas off-list. The thing is that I believe we
would need secure_writev() with SSL/GSS cases handled properly. I'm just
not sure if the effort would be worthwhile considering what we gain from it.


> I also wonder if the threshold for sending data directly should be
> smaller than the buffer size, and/or whether it should depend on the
> buffer being empty.


You might be right. I'm not sure what the ideal threshold would be.


> If we have an 8kB buffer that currently has
> nothing in it, and somebody writes 2kB, I suspect it might be wrong to
> copy that into the buffer. If the same buffer had 5kB used and 3kB
> free, copying sounds a lot more likely to work out. The goal here is
> probably to condense sequences of short messages into a single
> transmission while sending long messages individually. I'm just not
> quite sure what heuristic would do that most effectively.
>

Sounds like it's difficult to come up with a heuristic that would work well
enough for most cases.
One thing with sending data instead of copying it if the buffer is empty is
that initially the buffer is empty. I believe it will stay empty forever if
we do not copy anything when the buffer is empty. We can maybe simply set
the threshold to the buffer size/2 (4kB) and hope that will work better. Or
copy the data only if it fits into the remaining space in the buffer. What
do you think?


An additional note while I mentioned pq_putmessage_noblock(), I've been
testing sending input data immediately in pq_putmessage_noblock() without
blocking and copy the data into PqSendBuffer only if the socket would block
and cannot send it. Unfortunately, I don't have strong numbers to
demonstrate any improvement in perf or timing yet. But I still like to know
what would you think about it?

Thanks,
-- 
Melih Mutlu
Microsoft


Re: UUID v7

2024-01-30 Thread Sergey Prokhorenko
typo:
being carried to time step

should be:being carried to timestemp

Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Tuesday, 30 January 2024 at 04:35:45 pm GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 30 Jan 2024, at 15:33, Junwang Zhao  wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.
  

Re: Bytea PL/Perl transform

2024-01-30 Thread Pavel Stehule
Hi

út 30. 1. 2024 v 18:35 odesílatel Pavel Stehule 
napsal:

>
>
> út 30. 1. 2024 v 18:26 odesílatel Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> napsal:
>
>> Pavel Stehule  writes:
>>
>> > út 30. 1. 2024 v 17:46 odesílatel Dagfinn Ilmari Mannsåker <
>> > ilm...@ilmari.org> napsal:
>> >
>> >> Pavel Stehule  writes:
>> >>
>> >> > út 30. 1. 2024 v 17:18 odesílatel Dagfinn Ilmari Mannsåker <
>> >> > ilm...@ilmari.org> napsal:
>> >> >
>> >> >> Pavel Stehule  writes:
>> >> >>
>> >> >> > út 30. 1. 2024 v 16:43 odesílatel Dagfinn Ilmari Mannsåker <
>> >> >> > ilm...@ilmari.org> napsal:
>> >> >> >
>> >> >> >> Pavel Stehule  writes:
>> >> >> >>
>> >> >> >> > I inserted perl reference support - hstore_plperl and
>> json_plperl
>> >> does
>> >> >> >> it.
>> >> >> >> >
>> >> >> >> > +<->/* Dereference references recursively. */
>> >> >> >> > +<->while (SvROK(in))
>> >> >> >> > +<-><-->in = SvRV(in);
>> >> >> >>
>> >> >> >> That code in hstore_plperl and json_plperl is only relevant
>> because
>> >> they
>> >> >> >> deal with non-scalar values (hashes for hstore, and also arrays
>> for
>> >> >> >> json) which must be passed as references.  The recursive nature
>> of
>> >> the
>> >> >> >> dereferencing is questionable, and masked the bug fixed by commit
>> >> >> >> 1731e3741cbbf8e0b4481665d7d523bc55117f63.
>> >> >> >>
>> >> >> >> bytea_plperl only deals with scalars (specifically strings), so
>> >> should
>> >> >> >> not concern itself with references.  In fact, this code breaks
>> >> returning
>> >> >> >> objects with overloaded stringification, for example:
>> >> >> >>
>> >> >> >> CREATE FUNCTION plperlu_overload() RETURNS bytea LANGUAGE plperlu
>> >> >> >>   TRANSFORM FOR TYPE bytea
>> >> >> >>   AS $$
>> >> >> >> package StringOverload { use overload '""' => sub { "stuff"
>> }; }
>> >> >> >> return bless {}, "StringOverload";
>> >> >> >>   $$;
>> >> >> >>
>> >> >> >> This makes the server crash with an assertion failure from Perl
>> >> because
>> >> >> >> SvPVbyte() was passed a non-scalar value:
>> >> >> >>
>> >> >> >> postgres: ilmari regression_bytea_plperl [local] SELECT:
>> sv.c:2865:
>> >> >> >> Perl_sv_2pv_flags:
>> >> >> >> Assertion `SvTYPE(sv) != SVt_PVAV && SvTYPE(sv) != SVt_PVHV &&
>> >> >> SvTYPE(sv)
>> >> >> >> != SVt_PVFM' failed.
>> >> >> >>
>> >> >> >> If I remove the dereferincing loop it succeeds:
>> >> >> >>
>> >> >> >> SELECT encode(plperlu_overload(), 'escape') AS string;
>> >> >> >>  string
>> >> >> >> 
>> >> >> >>  stuff
>> >> >> >> (1 row)
>> >> >> >>
>> >> >> >> Attached is a v2 patch which removes the dereferencing and
>> includes
>> >> the
>> >> >> >> above example as a test.
>> >> >> >>
>> >> >> >
>> >> >> > But without dereference it returns bad value.
>> >> >>
>> >> >> Where exactly does it return a bad value?  The existing tests pass,
>> and
>> >> >> the one I included shows that it does the right thing in that case
>> too.
>> >> >> If you pass it an unblessed reference it returns the stringified
>> version
>> >> >> of that, as expected.
>> >> >>
>> >> >
>> >> > ugly test code
>> >> >
>> >> > (2024-01-30 13:44:28) postgres=# CREATE or replace FUNCTION
>> >> > perl_inverse_bytes(bytea) RETURNS bytea
>> >> > TRANSFORM FOR TYPE bytea
>> >> > AS $$ my $bytes =  pack 'H*', '0123'; my $ref = \$bytes;
>> >>
>> >> You are returning a reference, not a string.
>> >>
>> >
>> > I know, but for this case, should not be raised an error?
>>
>> I don't think so, as I explained in my previous reply:
>>
>> > There's no reason to ban references, that would break every Perl
>> > programmer's expectations.
>>
>> To elaborate on this: when a function is defined to return a string
>> (which bytea effectively is, as far as Perl is converned), I as a Perl
>> programmer would expect PL/Perl to just stringify whatever value I
>> returned, according to the usual Perl rules.
>>
>
> ok
>
> Pavel
>
>
>>
>> I also said:
>>
>> > If we really want to be strict, we should at least allow references to
>> > objects that overload stringification, as they are explicitly designed
>> > to be well-behaved as strings.  But that would be a lot of extra code
>> > for very little benefit over just letting Perl stringify everything.
>>
>
>> By "a lot of code", I mean everything `string_amg`-related in the
>> amagic_applies() function
>> (https://github.com/Perl/perl5/blob/v5.38.0/gv.c#L3401-L3545).  We can't
>> just call it: it's only available since Perl 5.38 (released last year),
>> and we support Perl versions all the way back to 5.14.
>>
>> - ilmari
>>
>
I marked this patch as ready for committer.

It is almost trivial, make check-world, make doc passed

Regards

Pavel


Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Robert Haas
On Tue, Jan 30, 2024 at 12:58 PM Melih Mutlu  wrote:
> Sounds like it's difficult to come up with a heuristic that would work well 
> enough for most cases.
> One thing with sending data instead of copying it if the buffer is empty is 
> that initially the buffer is empty. I believe it will stay empty forever if 
> we do not copy anything when the buffer is empty. We can maybe simply set the 
> threshold to the buffer size/2 (4kB) and hope that will work better. Or copy 
> the data only if it fits into the remaining space in the buffer. What do you 
> think?
>
> An additional note while I mentioned pq_putmessage_noblock(), I've been 
> testing sending input data immediately in pq_putmessage_noblock() without 
> blocking and copy the data into PqSendBuffer only if the socket would block 
> and cannot send it. Unfortunately, I don't have strong numbers to demonstrate 
> any improvement in perf or timing yet. But I still like to know what would 
> you think about it?

I think this is an area where it's very difficult to foresee on
theoretical grounds what will be right in practice. The problem is
that the best algorithm probably depends on what usage patterns are
common in practice. I think one common usage pattern will be a bunch
of roughly equal-sized messages in a row, like CopyData or DataRow
messages -- but those messages won't have a consistent width. It would
probably be worth testing what behavior you see in such cases -- start
with say a stream of 100 byte messages and then gradually increase and
see how the behavior evolves.

But you can also have other patterns, with messages of different sizes
interleaved. In the case of FE-to-BE traffic, the extended query
protocol might be a good example of that: the Parse message could be
quite long, or not, but the Bind Describe Execute Sync messages that
follow are probably all short. That case doesn't arise in this
direction, but I can't think exactly of what cases that do. It seems
like someone would need to play around and try some different cases
and maybe log the sizes of the secure_write() calls with various
algorithms, and then try to figure out what's best. For example, if
the alternating short-write, long-write behavior that Heikki mentioned
is happening, and I do think that particular thing is a very real
risk, then you haven't got it figured out yet...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add LSN <-> time conversion functionality

2024-01-30 Thread Melanie Plageman
On Wed, Dec 27, 2023 at 5:16 PM Melanie Plageman
 wrote:
>
> Elsewhere [1] I required a way to estimate the time corresponding to a
> particular LSN in the past. I devised the attached LSNTimeline, a data
> structure mapping LSNs <-> timestamps with decreasing precision for
> older time, LSN pairs. This can be used to locate and translate a
> particular time to LSN or vice versa using linear interpolation.

Attached is a new version which fixes one overflow danger I noticed in
the original patch set.

I have also been doing some thinking about the LSNTimeline data
structure. Its array elements are combined before all elements have
been used. This sacrifices precision earlier than required. I tried
some alternative structures that would use the whole array. There are
a lot of options, though. Currently each element fits twice as many
members as the preceding element. To use the whole array, we'd have to
change the behavior from filling each element to its max capacity to
something that filled elements only partially. I'm not sure what the
best distribution would be.

> I've added an instance of the LSNTimeline to PgStat_WalStats and insert
> new values to it in background writer's main loop. This patch set also
> introduces some new pageinspect functions exposing LSN <-> time
> translations.

I was thinking that maybe it is silly to have the functions allowing
for translation between LSN and time in the pageinspect extension --
since they are not specifically related to pages (pages are just an
object that has an accessible LSN). I was thinking perhaps we add them
as system information functions. However, the closest related
functions I can think of are those to get the current LSN (like
pg_current_wal_lsn ()). And those are listed as system administration
functions under backup control [1]. I don't think the LSN <-> time
functionality fits under backup control.

If I did put them in one of the system information function sections
[2], which one would work best?

- Melanie

[1] 
https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-ADMIN-BACKUP
[2] https://www.postgresql.org/docs/devel/functions-info.html#FUNCTIONS-INFO
From 8590125d66ce366b35251e5aff14db1a858edda9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 27 Dec 2023 16:40:27 -0500
Subject: [PATCH v2 2/5] Add LSNTimeline for converting LSN <-> time

Add a new structure, LSNTimeline, consisting of LSNTimes -- each an LSN,
time pair. Each LSNTime can represent multiple logical LSN, time pairs,
referred to as members. LSN <-> time conversions can be done using
linear interpolation with two LSNTimes on the LSNTimeline.

This commit does not add a global instance of LSNTimeline. It adds the
structures and functions needed to maintain and access such a timeline.
---
 src/backend/utils/activity/pgstat_wal.c | 199 
 src/include/pgstat.h|  34 
 src/tools/pgindent/typedefs.list|   2 +
 3 files changed, 235 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 1a3c0e1a669..e8d9660f82e 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -17,8 +17,11 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "utils/pgstat_internal.h"
 #include "executor/instrument.h"
+#include "utils/builtins.h"
+#include "utils/timestamp.h"
 
 
 PgStat_PendingWalStats PendingWalStats = {0};
@@ -32,6 +35,12 @@ PgStat_PendingWalStats PendingWalStats = {0};
 static WalUsage prevWalUsage;
 
 
+static void lsntime_absorb(LSNTime *a, const LSNTime *b);
+void lsntime_insert(LSNTimeline *timeline, TimestampTz time, XLogRecPtr lsn);
+
+XLogRecPtr estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time);
+TimestampTz estimate_time_at_lsn(const LSNTimeline *timeline, XLogRecPtr lsn);
+
 /*
  * Calculate how much WAL usage counters have increased and update
  * shared WAL and IO statistics.
@@ -184,3 +193,193 @@ pgstat_wal_snapshot_cb(void)
 		   sizeof(pgStatLocal.snapshot.wal));
 	LWLockRelease(&stats_shmem->lock);
 }
+
+/*
+ * Set *a to be the earlier of *a or *b.
+ */
+static void
+lsntime_absorb(LSNTime *a, const LSNTime *b)
+{
+	LSNTime		result;
+	uint64		new_members = a->members + b->members;
+
+	if (a->time < b->time)
+		result = *a;
+	else if (b->time < a->time)
+		result = *b;
+	else if (a->lsn < b->lsn)
+		result = *a;
+	else if (b->lsn < a->lsn)
+		result = *b;
+	else
+		result = *a;
+
+	*a = result;
+	a->members = new_members;
+}
+
+/*
+ * Insert a new LSNTime into the LSNTimeline in the first element with spare
+ * capacity.
+ */
+void
+lsntime_insert(LSNTimeline *timeline, TimestampTz time,
+			   XLogRecPtr lsn)
+{
+	LSNTime		temp;
+	LSNTime		carry = {.lsn = lsn,.time = time,.members = 1};
+
+	for (int i = 0; i < timeline->length; i++)
+	{
+		bool		full;
+		LSNTime*cur = &timeline->data[i];
+
+		/*
+		 * An array element's capacity to represent members is 2 ^ it

Re: Schema variables - new implementation for Postgres 15

2024-01-30 Thread Dmitry Dolgov
Yep, in this constellation the implementation holds much better (in
terms of memory) in my create/let/drop testing.

I've marked the CF item as ready for committer, but a note for anyone
who would like to pick up it from here -- we're talking about first 5
patches here, up to the memory cleaning after DROP VARIABLE. It doesn't
mean the rest is somehow not worth it, but I believe it's a good first
step.




Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Tom Lane
Robert Haas  writes:
> I have to admit that I'm a little afraid that people will mistake this
> for an actual security feature and file bug reports or CVEs about the
> superuser being able to circumvent these restrictions. If we add this,
> we had better make sure that the documentation is extremely clear
> about what we are guaranteeing, or more to the point about what we are
> not guaranteeing.

> I understand that there's some frustration on the part of Gabriele and
> others that this proposal hasn't been enthusiastically adopted, but I
> would ask for a little bit of forbearance because those are also, by
> and large, not the people who will not have to cope with it when we
> start getting security researchers threatening to publish our evilness
> in the Register. Such conversations are no fun at all.

Indeed.  I'd go so far as to say that we should reject not only this
proposal, but any future ones that intend to prevent superusers from
doing things that superusers normally could do (and, indeed, are
normally expected to do).  That sort of thing is not part of our
security model, never has been, and it's simply naive to believe that
it won't have a boatload of easily-reachable holes in it.  Which we
*will* get complaints about, if we claim that thus-and-such feature
prevents it.  So why bother?  Don't give out superuser to people you
don't trust to not do the things you wish they wouldn't.

> I also think that using the GUC system to manage itself is a little
> bit suspect.

Something like contrib/sepgsql would be a better mechanism, perhaps.

regards, tom lane




Re: Schema variables - new implementation for Postgres 15

2024-01-30 Thread Pavel Stehule
út 30. 1. 2024 v 20:15 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> Yep, in this constellation the implementation holds much better (in
> terms of memory) in my create/let/drop testing.
>
> I've marked the CF item as ready for committer, but a note for anyone
> who would like to pick up it from here -- we're talking about first 5
> patches here, up to the memory cleaning after DROP VARIABLE. It doesn't
> mean the rest is somehow not worth it, but I believe it's a good first
> step.
>

Thank you very much

Pavel


Re: Fix some ubsan/asan related issues

2024-01-30 Thread Heikki Linnakangas

On 30/01/2024 17:57, Tristan Partin wrote:

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, and there
didn't seem to be anything preventing that in the LogLogicalMessage()
codepath. Here is a preventative measure in LogLogicalMessage() and an
Assert() in CopyXLogRecordToWAL().


For the audience: We ran into this one with the neon extension. The 
solution is to call LogLogicalMessage("", 0, ...) instead of 
LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor 
LogLogicalMessage to call XLogRegisterData() if the message is empty. If 
we do this, we should check for 'size == 0' rather than 'message == NULL'.



Patch 2:

Support building with -Db_sanitize=address in Meson. Various executables
are leaky which can cause the builds to fail and tests to fail, when we
are fine leaking this memory.

Personally, I am a big stickler for always freeing my memory in
executables even if it gets released at program exit because it makes
the leak sanitizer much more effective. However (!), I am not here to
change the philosophy of memory management in one-off executables, so
I have just silenced memory leaks in various executables for now.

Patch 3:

THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!

In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.

Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.


I'm a bit confused by these. We already run with sanitizer in the cirrus 
CI. What does this enable that we're not already doing?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [17] CREATE SUBSCRIPTION ... SERVER

2024-01-30 Thread Jeff Davis
On Tue, 2024-01-30 at 16:17 +0530, Ashutosh Bapat wrote:
> Converting a server and user mapping to
> conninfo should be delegated to the FDW being used since that FDW
> knows best how to use those options.

If I understand you correctly, you mean that there would be a new
optional function associated with an FDW (in addition to the HANDLER
and VALIDATOR) like "CONNECTION", which would be able to return the
conninfo from a server using that FDW. Is that right?

I like the idea -- it further decouples the logic from the core server.
I suspect it will make postgres_fdw the primary way (though not the
only possible way) to use this feature. There would be little need to
create a new builtin FDW to make this work.

To get the subscription invalidation right, we'd need to make the
(reasonable) assumption that the connection information is based only
on the FDW, server, and user mapping. A FDW wouldn't be able to use,
for example, some kind of configuration table or GUC to control how the
connection string gets created. That's easy enough to solve with
documentation.

I'll work up a new patch for this.


Regards,
Jeff Davis





Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-30 Thread Tom Lane
Richard Guo  writes:
> On Wed, Jan 17, 2024 at 5:01 PM Richard Guo  wrote:
>> Sure, here it is:
>> v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch

> I forgot to mention that this patch applies on v16 not master.

I spent some time looking at this patch (which seems more urgent than
the patch for master, given that we have back-branch releases coming
up).  There are two things I'm not persuaded about:

* Why is it okay to just use pull_varnos on a tablesample expression,
when contain_references_to() does something different?

* Is contain_references_to() doing the right thing by specifying
PVC_RECURSE_PLACEHOLDERS?  That causes it to totally ignore a
PlaceHolderVar's ph_eval_at, and I'm not convinced that's correct.

Ideally it seems to me that we want to reject a PlaceHolderVar
if either its ph_eval_at or ph_lateral overlap the other join
relation; if it was coded that way then we'd not need to recurse
into the PHV's contents.   pull_varnos isn't directly amenable
to this, but I think we could use pull_var_clause with
PVC_INCLUDE_PLACEHOLDERS and then iterate through the resulting
list manually.  (If this patch were meant for HEAD, I'd think
about extending the var.c code to support this usage more directly.
But as things stand, this is a one-off so I think we should just do
what we must in reparameterize_path_by_child.)

BTW, it shouldn't be necessary to write either PVC_RECURSE_AGGREGATES
or PVC_RECURSE_WINDOWFUNCS, because neither kind of node should ever
appear in a scan-level expression.  I'd leave those out so that we
get an error if something unexpected happens.

regards, tom lane




Re: Some revises in adding sorting path

2024-01-30 Thread David Rowley
On Wed, 31 Jan 2024 at 00:44, Richard Guo  wrote:
> This patchset does not aim to introduce anything new; it simply
> refactors the existing code.  The newly added tests are used to show
> that the code that is touched here is not redundant, but rather
> essential for generating certain paths.  I remember the tests were added
> per your comment in [3].
>
> [3] 
> https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

OK.  I've pushed the patched based on it being a simplification of the
partial path generation.

David




Re: Fix some ubsan/asan related issues

2024-01-30 Thread Andres Freund
Hi,

On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote:
> On 30/01/2024 17:57, Tristan Partin wrote:
> > In my effort to try to see if the test suite would pass with asan
> > enabled, I ran into a max_stack_depth issue. I tried maxing it out
> > (hence, the patch), but that still didn't remedy my issue. I tried to
> > look on the list for any relevant emails, but nothing turned up. Maybe
> > others have not attempted this before? Seems doubtful.
> > 
> > Not entirely sure how to fix this issue. I personally find asan
> > extremely effective, even more than valgrind, so it would be great if
> > I could run Postgres with asan enabled to catch various stupid C issues
> > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
> > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
> > seem to leave enough stack space for Postgres.
> 
> I'm a bit confused by these. We already run with sanitizer in the cirrus CI.
> What does this enable that we're not already doing?

The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...

The checks are actually quite useful, so making our stack depth check work
with asan would be worthwhile.

I discussed this in a bit more in
https://postgr.es/m/20231129193920.4vphw7dqxjzf5v5b%40awork3.anarazel.de

Greetings,

Andres Freund




Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Robert Haas
On Tue, Jan 30, 2024 at 2:20 PM Tom Lane  wrote:
> Indeed.  I'd go so far as to say that we should reject not only this
> proposal, but any future ones that intend to prevent superusers from
> doing things that superusers normally could do (and, indeed, are
> normally expected to do).  That sort of thing is not part of our
> security model, never has been, and it's simply naive to believe that
> it won't have a boatload of easily-reachable holes in it.  Which we
> *will* get complaints about, if we claim that thus-and-such feature
> prevents it.  So why bother?  Don't give out superuser to people you
> don't trust to not do the things you wish they wouldn't.

In my opinion, we need to have the conversation, whereas you seem to
want to try to shut it down before it starts. If we take that
approach, people are going to get (more) frustrated.

Also in my opinion, there is a fair amount of nuance here. On the one
hand, I and others put a lot of work into making it possible to not
give people superuser and still be able to do a controlled subset of
the things that a superuser can do. For example, thanks to Mark
Dilger's work, you can make somebody not a superuser and still allow
them to set GUCs that can normally be set only by superusers, and you
can choose which GUCs you do and do not want them to be able to set.
And, thanks to my work, you can make someone a CREATEROLE user without
letting them escalate to superuser, and you can then allow them to
manage the users that they create almost exactly as if they were a
superuser, with only the limitations that seem necessary to maintain
system security. It is worth asking - and I would like to hear a real,
non-flip answer - why someone who wants to do what is proposed here
isn't using those mechanisms instead of handing out SUPERUSER and then
complaining that it grants too much power.

On the other hand, I don't see why it isn't legitimate to imagine a
scenario where there is no security boundary between the Kubernetes
administrator and the PostgreSQL DBA, and yet the PostgreSQL DBA
should still be pushed in the direction of doing things in a way that
doesn't break Kubernetes. It surprises me a little bit that Gabriele
and others want to build the system that way, though, because you
might expect that in a typical install the Kubernetes administrator
would want to FORCIBLY PREVENT the PostgreSQL administrator from
messing things up instead of doing what is proposed here, which
amounts to suggesting perhaps the PostgreSQL administrator would be
kind enough not to mess things up. Nonetheless, there's no law against
suggestions. When my wife puts the ground beef that I'm supposed to
use to cook dinner at the top of the freezer and the stuff I'm
supposed to not use at the bottom, nothing prevents me from digging
out the other ground beef and using it, but I don't, because I can
take a hint. And indeed, I benefit from that hint. This seems like it
could be construed as a very similar type of hint.

I don't think we should pretend like one of the two paragraphs above
is valid and the other is hot garbage. That's not solving anything. We
can't resolve the tension between those two things in either direction
by somebody hammering on the side of the argument that they believe to
be correct and ignoring the other one.

> Something like contrib/sepgsql would be a better mechanism, perhaps.

There's nothing wrong with that exactly, but what does it gain us over
my proposal of a sentinel file? I don't see much value in adding a
hook and then a module that uses that hook to return false or
unconditionally ereport.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 30, 2024 at 2:20 PM Tom Lane  wrote:
>> Indeed.  I'd go so far as to say that we should reject not only this
>> proposal, but any future ones that intend to prevent superusers from
>> doing things that superusers normally could do (and, indeed, are
>> normally expected to do).

> Also in my opinion, there is a fair amount of nuance here. On the one
> hand, I and others put a lot of work into making it possible to not
> give people superuser and still be able to do a controlled subset of
> the things that a superuser can do.

Sure, and that is a line of thought that we should continue to pursue.
But we already have enough mechanism to let a non-superuser set only
the ALTER SYSTEM stuff she's authorized to.  There is no reason to
think that a non-superuser could break through that restriction at
all, let alone easily.  So that's an actual security feature, not
security theater.  I don't see how the feature proposed here isn't
security theater, or at least close enough to that.

>> Something like contrib/sepgsql would be a better mechanism, perhaps.

> There's nothing wrong with that exactly, but what does it gain us over
> my proposal of a sentinel file?

I was imagining using selinux and/or sepgsql to directly prevent
writing postgresql.auto.conf from the Postgres account.  Combine that
with a non-Postgres-owned postgresql.conf (already supported) and you
have something that seems actually bulletproof, rather than a hint.
Admittedly, using that approach requires knowing something about a
non-Postgres security mechanism.

regards, tom lane




Re: [PATCH] Add native windows on arm64 support

2024-01-30 Thread Andrew Dunstan


On 2024-01-30 Tu 09:50, Dave Cramer wrote:



On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan  wrote:


On 2024-01-29 Mo 11:20, Dave Cramer wrote:


Dave Cramer
www.postgres.rocks 


On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan
 wrote:


On 2024-01-26 Fr 09:18, Dave Cramer wrote:



On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan
 wrote:


On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer
wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan
 wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt
for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing
"vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here.  Based on [1], x64_arm64 means you
can use a x64
> host and you'll be able to produce ARM64 builds, still
these will not
> be able to run on the host where they were built.  How
much of the
> patch posted upthread is required to produce such
builds?  Basically
> everything from it, I guess, so as build dependencies
can be
> satisfied?
>
> [1]:

https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170


If you look at the table here x86 and x64 are the only
supported host
architectures. But that's OK, the x64 binaries will run
on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work
Dave and I would
not have got as far as we have. But you want the
x64_arm64 argument to
vcvarsall so you will get ARM64 output.


I've rebuilt it using  x64_arm64 and with the attached (very
naive patch) and I still get an x64 binary :(



With this patch I still get a build error, but it's different :-)


[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
   Creating library src\backend\postgres.exe.lib

storage_lmgr_s_lock.c.obj : error LNK2019: unresolved
external symbol spin_delay referenced in function
perform_spin_delay

src\backend\postgres.exe : fatal error LNK1120: 1 unresolved
externals


Did you add the latest lock.patch ?





I'm a bit confused about exactly what needs to be applied. Can you
supply a complete patch to be applied to a pristine checkout that
will let me build?


cheers


See attached.




No, that is what is giving me the error shown above (just tried again to 
be certain). And it's not surprising, as patch 2 #ifdef's out the 
definition of spin_delay().


If you can get a complete build with these patches then I suspect you're 
not doing a proper ARM64 build.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Fix some ubsan/asan related issues

2024-01-30 Thread Andres Freund
Hi,

On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Mon, 29 Jan 2024 18:03:39 -0600
> Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
>  NULL

> If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> the API contract of memcpy in glibc. The two pointer arguments are
> marked as nonnull, even in the event the amount to copy is 0 bytes.

It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?


> From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 24 Jan 2024 17:07:01 -0600
> Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> 
> The ecpg is parser is extremely leaky, so we need to silence leak
> detection.

This does stuff beyond epcg...


> +if get_option('b_sanitize').contains('address')
> +  cdata.set('USE_ADDRESS_SANITIZER', 1)
> +endif
>  
>  ###
>  # NLS / Gettext
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index ac409b0006..e18e716d9c 100644
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
> @@ -338,6 +338,17 @@ do { \
>   output_failed = true, output_errno = errno; \
>  } while (0)
>  
> +#ifdef USE_ADDRESS_SANITIZER

When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.


> +const char *__asan_default_options(void);
> +
> +const char *__asan_default_options(void)
> +{
> + return "detect_leaks=0";
> +}
> +
> +#endif

Wonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...

Greetings,

Andres Freund




Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Magnus Hagander
On Tue, Jan 30, 2024 at 10:48 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > There's nothing wrong with that exactly, but what does it gain us over
> > my proposal of a sentinel file?
>
> I was imagining using selinux and/or sepgsql to directly prevent
> writing postgresql.auto.conf from the Postgres account.  Combine that
> with a non-Postgres-owned postgresql.conf (already supported) and you
> have something that seems actually bulletproof, rather than a hint.
> Admittedly, using that approach requires knowing something about a
> non-Postgres security mechanism.

Wouldn't a simple "chattr +i postgresql.auto.conf" work?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-01-30 Thread Andrew Dunstan



On 2024-01-30 Tu 06:49, Andrew Dunstan wrote:


On 2024-01-30 Tu 06:21, Nazir Bilal Yavuz wrote:

Hi,

I was trying to install newer Perl versions to Windows CI images and
found that 003_extrafiles.pl test fails on Windows with:

(0.183s) not ok 2 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir' 


# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir' 



(0.263s) not ok 5 - file lists match
(0.000s) #   Failed test 'file lists match'
#   at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81.
(0.000s) # Structures begin differing at:
#  $got->[0] =
'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir' 


# $expected->[0] =
'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir' 



It looks like File::Find converts backslashes to slashes in the newer
Perl versions. I tried to find the related commit and found this:
https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d 



To solve this, I did the same conversion for Windows before comparing
the paths. And to support all Perl versions, I decided to always
convert them on Windows regardless of the Perl's version. The fix is
attached.

I looked at other File::Find appearances in the code but they do not
compare the paths. So, I do not think there is any need to fix them.

Any kind of feedback would be appreciated.



Looks reasonable on the face of it. I'll see about pushing this today.



Pushed to all live branches. Thanks for the patch.


cheers


andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: psql not responding to SIGINT upon db reconnection

2024-01-30 Thread Tristan Partin

On Fri Jan 12, 2024 at 11:13 AM CST, Tristan Partin wrote:

On Fri Jan 12, 2024 at 10:45 AM CST, Robert Haas wrote:
> On Mon, Jan 8, 2024 at 1:03 AM Tristan Partin  wrote:
> > I think the way to go is to expose some variation of libpq's
> > pqSocketPoll(), which I would be happy to put together a patch for.
> > Making frontends, psql in this case, have to reimplement the polling
> > logic doesn't strike me as fruitful, which is essentially what I have
> > done.
>
> I encourage further exploration of this line of attack. I fear that if
> I were to commit something like what you've posted up until now,
> people would complain that that code was too ugly to live, and I'd
> have a hard time telling them that they're wrong.

Completely agree. Let me look into this. Perhaps I can get something up 
next week or the week after.


Not next week, but here is a respin. I've exposed pqSocketPoll as 
PQsocketPoll and am just using that. You can see the diff is so much 
smaller, which is great!


In order to fight the race condition, I am just using a 1 second timeout 
instead of trying to integrate pselect or ppoll. We could add 
a PQsocketPPoll() to support those use cases, but I am not sure how 
available pselect and ppoll are. I guess on Windows we don't have 
pselect. I don't think using the pipe trick that Heikki mentioned 
earlier is suitable to expose via an API in libpq, but someone else 
might have a different opinion.


Maybe this is good enough until someone complains? Most people would 
probably just chalk any latency between keypress and cancellation as 
network latency and not a hardcoded 1 second.


Thanks for your feedback Robert!

--
Tristan Partin
Neon (https://neon.tech)
From 4fa6db1900a355a171d3e16019d02f3a415764d0 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 30 Jan 2024 15:40:57 -0600
Subject: [PATCH v8 1/2] Expose PQsocketPoll for frontends

PQsocketPoll should help with state machine processing when connecting
to a database asynchronously via PQconnectStart().
---
 doc/src/sgml/libpq.sgml | 5 -
 src/interfaces/libpq/fe-misc.c  | 7 +++
 src/interfaces/libpq/libpq-fe.h | 4 
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d0d5aefadc..aa26c2cc8d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -358,7 +358,10 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn);
Loop thus: If PQconnectPoll(conn) last returned
PGRES_POLLING_READING, wait until the socket is ready to
read (as indicated by select(), poll(), or
-   similar system function).
+   similar system function).  Note that PQsocketPoll
+   can help reduce boilerplate by abstracting the setup of
+   select(), or poll() if it is
+   available on your system.
Then call PQconnectPoll(conn) again.
Conversely, if PQconnectPoll(conn) last returned
PGRES_POLLING_WRITING, wait until the socket is ready
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 47a28b0a3a..ee917d375d 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -55,7 +55,6 @@ static int	pqPutMsgBytes(const void *buf, size_t len, PGconn *conn);
 static int	pqSendSome(PGconn *conn, int len);
 static int	pqSocketCheck(PGconn *conn, int forRead, int forWrite,
 		  time_t end_time);
-static int	pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time);
 
 /*
  * PQlibVersion: return the libpq version number
@@ -1059,7 +1058,7 @@ pqSocketCheck(PGconn *conn, int forRead, int forWrite, time_t end_time)
 
 	/* We will retry as long as we get EINTR */
 	do
-		result = pqSocketPoll(conn->sock, forRead, forWrite, end_time);
+		result = PQsocketPoll(conn->sock, forRead, forWrite, end_time);
 	while (result < 0 && SOCK_ERRNO == EINTR);
 
 	if (result < 0)
@@ -1083,8 +1082,8 @@ pqSocketCheck(PGconn *conn, int forRead, int forWrite, time_t end_time)
  * Timeout is infinite if end_time is -1.  Timeout is immediate (no blocking)
  * if end_time is 0 (or indeed, any time before now).
  */
-static int
-pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+int
+PQsocketPoll(int sock, int forRead, int forWrite, time_t end_time)
 {
 	/* We use poll(2) if available, otherwise select(2) */
 #ifdef HAVE_POLL
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index defc415fa3..11a7fd32b6 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -21,6 +21,7 @@ extern "C"
 #endif
 
 #include 
+#include 
 
 /*
  * postgres_ext.h defines the backend's externally visible types,
@@ -644,6 +645,9 @@ extern int	lo_export(PGconn *conn, Oid lobjId, const char *filename);
 /* Get the version of the libpq library in use */
 extern int	PQlibVersion(void);
 
+/* Poll a file descriptor for reading and/or writing with a timeout */
+extern int	PQsocketPoll(int sock, int forRead, int forWrite, t

  1   2   >