Re: optimize file transfer in pg_upgrade

2025-03-01 Thread Nathan Bossart
I've spent quite a bit of time recently trying to get this patch set into a
reasonable state.  It's still a little rough around the edges, and the code
for the generated scripts is incomplete, but I figured I'd at least get
some CI testing going.

-- 
nathan
>From 0af23114cfe5d00ab0b69ff804bb92d58d485adb Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 19 Feb 2025 09:14:51 -0600
Subject: [PATCH v3 1/4] initdb: Add --no-sync-data-files.

This new option instructs initdb to skip synchronizing any files
in database directories and the database directories themselves,
i.e., everything in the base/ subdirectory and any other
tablespace directories.  Other files, such as those in pg_wal/ and
pg_xact/, will still be synchronized unless --no-sync is also
specified.  --no-sync-data-files is primarily intended for internal
use by tools that separately ensure the skipped files are
synchronized to disk.  A follow-up commit will use this to help
optimize pg_upgrade's file transfer step.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
---
 doc/src/sgml/ref/initdb.sgml| 20 +
 src/bin/initdb/initdb.c | 10 ++-
 src/bin/initdb/t/001_initdb.pl  |  1 +
 src/bin/pg_basebackup/pg_basebackup.c   |  2 +-
 src/bin/pg_checksums/pg_checksums.c |  2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c |  2 +-
 src/bin/pg_rewind/file_ops.c|  2 +-
 src/common/file_utils.c | 85 +
 src/include/common/file_utils.h |  2 +-
 9 files changed, 89 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 0026318485a..14c401b9a99 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -527,6 +527,26 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-sync-data-files
+  
+   
+By default, initdb safely writes all database files
+to disk.  This option instructs initdb to skip
+synchronizing all files in the individual database directories and the
+database directories themselves, i.e., everything in the
+base subdirectory and any other tablespace
+directories.  Other files, such as those in pg_wal
+and pg_xact, will still be synchronized unless the
+--no-sync option is also specified.
+   
+   
+This option is primarily intended for internal use by tools that
+separately ensure the skipped files are synchronized to disk.
+   
+  
+ 
+
  
   --no-instructions
   
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 21a0fe3ecd9..22b7d31b165 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,7 @@ static bool data_checksums = true;
 static char *xlog_dir = NULL;
 static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
 static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static bool sync_data_files = true;
 
 
 /* internal vars */
@@ -2566,6 +2567,7 @@ usage(const char *progname)
printf(_("  -L DIRECTORY  where to find the input 
files\n"));
printf(_("  -n, --no-cleando not clean up after errors\n"));
printf(_("  -N, --no-sync do not wait for changes to be 
written safely to disk\n"));
+   printf(_("  --no-sync-data-files  do not sync files within database 
directories\n"));
printf(_("  --no-instructions do not print instructions for 
next steps\n"));
printf(_("  -s, --showshow internal settings, then 
exit\n"));
printf(_("  --sync-method=METHOD  set method for syncing files to 
disk\n"));
@@ -3208,6 +3210,7 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+   {"no-sync-data-files", no_argument, NULL, 21},
{NULL, 0, NULL, 0}
};
 
@@ -3402,6 +3405,9 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+   case 21:
+   sync_data_files = false;
+   break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more 
information.", progname);
@@ -3453,7 +3459,7 @@ main(int argc, char *argv[])
 
fputs(_("syncing data to disk ... "), stdout);
fflush(stdout);
-   sync_pgdata(pg_data, PG_VERSION_NUM, sync_method);
+   sync_pgdata(pg_data, PG_VERSION_NUM, sync_method, 
sync_data_files);
check_ok();
return 0;
}
@

Re: optimize file transfer in pg_upgrade

2025-03-01 Thread Robert Haas
On Fri, Feb 28, 2025 at 2:40 PM Robert Haas  wrote:
> Maybe we should rethink the decision not to transfer relfilenodes for
> sequences. Or have more than one way to do it. pg_upgrade
> --binary-upgrade --binary-upgrade-even-for-sequences, or whatever.

Sorry, I meant: pg_dump --binary-upgrade --binary-upgrade-even-for-sequences

i.e. pg_upgrade could decide which way to ask pg_dump to do it,
depending on versions and flags.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: making EXPLAIN extensible

2025-03-01 Thread Thom Brown
On Fri, 28 Feb 2025 at 19:26, Robert Haas  wrote:
>
> Prior to PostgreSQL 10, EXPLAIN had just 2 options: VACUUM and
> ANALYZE. Now, we're up to 12 options, which is already quite a lot,
> and there's plenty more things that somebody might like to do.
> However, not all of those things necessarily need to be part of the
> core code. My original reason for wanting to extend EXPLAIN was that I
> was thinking about an extension that would want to do a bunch of
> things and one of those things would be to add some information to the
> EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN
> option whose whole purpose is to cater to the needs of some extension,
> so that made me think of providing some extensibility infrastructure.
>
> However, there are other use cases, too, basically any of the normal
> reasons why extensibility is useful and desirable. You might need to
> get some information out a query plan that 99% of people don't care
> about. You could come up with your own way of formatting a query plan,
> but that's a big pain. It's a lot nicer if you can just add the detail
> that you care about to the EXPLAIN output without needing to modify
> PostgreSQL itself. Even if you think of something that really ought to
> be included in the EXPLAIN output by PostgreSQL, you can roll an
> extension out much quicker than you can get a change upstreamed and
> released. So I think EXPLAIN extensibility is, as a general concept,
> useful.
>
> So here are some patches.
>
> 0001 allows a loadable module to register new EXPLAIN options.
> Currently, EXPLAIN (FUNGUS) will error out, but if you want to make it
> work, this patch is for you. This patch also allows you to stash some
> state related to your new option, or options, in the ExplainState.
> Core options have hard-coded structure members; e.g. EXPLAIN (BUFFERS)
> sets es->buffers. If you add EXPLAIN (FUNGUS), there won't be an
> es->fungus, but you can get about the same effect using the new
> facilities provided here.
>
> 0002 provides hooks that you can use to make your new EXPLAIN options
> actually do something. In particular, this adds a new hook that is
> called once per PlanState node, and a new nook that is called once per
> PlannedStmt. Each is called at an appropriate point for you to tack on
> more output after what EXPLAIN would already produce.
>
> 0003 adds a new contrib module called pg_overexplain, which adds
> EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE). I actually think this is
> quite useful for planner hacking, and maybe a few more options would
> be, too. Right now, if you want to see stuff that EXPLAIN doesn't
> clearly show, you have to use SET debug_print_plan = true, and that
> output is so verbose that finding the parts you actually want to see
> is quite difficult. Assuming it gives you the details you need,
> EXPLAIN (RANGE_TABLE) looks way, way better to me, and if we end up
> committing these patches I anticipate using this semi-regularly.
>
> There are plenty of debatable things in this patch set, and I mention
> some of them in the commit messages. The hook design in 0002 is a bit
> simplistic and could be made more complex; there's lots of stuff that
> could be added to or removed from 0003, much of which comes down to
> what somebody hacking on the planner would actually want to see. I'm
> happy to bikeshed all of that stuff; this is all quite preliminary and
> I'm not committed to the details. The only thing that would disappoint
> me is if somebody said "this whole idea of making EXPLAIN extensible
> is stupid and pointless and we shouldn't ever do it." I will argue
> against that vociferously. I think even what I have here is enough to
> disprove that hypothesis, but I have a bunch of ideas about how to do
> more. Some of those require additional infrastructure and are best
> proposed with that other infrastructure; some can be done with just
> this, but I ran out of time to code up examples so here is what I have
> got so far.
>
> Hope you like it, sorry if you don't.

"pg_overexplain"? I love this name! And the idea sounds like a natural
evolution, so +1.

Some questions:

One thing I am wondering is whether extensions should be required to
prefix their EXPLAIN option with the extension name to avoid
collisions.

If two extensions happen to choose the same name, it won't be possible
to use both simultaneously.

In what order would the options be applied? Would it be deterministic,
or weighted within the extension's configuration, or based on the
order of the options in the list?

Would explain extensions be capable of modifying pre-existing core
option output, or just append to output?

Should there be a way of determining which lines are output by which
option? An extension may output similar output to core output, making
it difficult or impossible to discern which is which.

Does there need to be any security considerations so that things like
RLS don't inadvertently become leaky?

Thom




Improve monitoring of shared memory allocations

2025-03-01 Thread Rahila Syed
Hi,

The 0001* patch improved the accounting for the shared memory allocated for
a
hash table during hash_create.
pg_shmem_allocations tracks the memory allocated by ShmemInitStruct, which,
for shared hash tables, only covers memory allocated for the hash
directory
and control structure via ShmemInitHash. The hash segments and  buckets
are allocated using ShmemAllocNoError, which does not attribute the
allocation
to the hash table and also does not add it to ShmemIndex.

Therefore, these allocations are not tracked in pg_shmem_allocations.
To improve this, include the allocation of segments and buckets in the
initial
allocation of the shared memory for the hash table, in ShmemInitHash.

This will result in pg_shmem_allocations representing the total size of the
initial
hash table, including all the buckets and elements, instead of just the
directory
size.

Like ShmemAllocNoError, the shared memory allocated by ShmemAlloc is not
tracked by pg_shmem_allocations.
The 0002* patch replaces most of the calls to ShmemAlloc with
ShmemInitStruct
to associate a name with the allocations and ensure that they get tracked
by
pg_shmem_allocations.

I observed an improvement in total memory allocation by consolidating
initial shared
memory allocations for the hash table. For ex. the allocated size for the
LOCK hash
hash_create decreased from 801664 bytes to 799616 bytes. Please find the
attached
patches, which I will add to the March Commitfest.

Thank you,
Rahila Syed


0001-Account-for-initial-shared-memory-allocated-during-h.patch
Description: Binary data


0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch
Description: Binary data


Re: lwlocknames.h beautification attempt

2025-03-01 Thread Tom Lane
Gurjeet Singh  writes:
> On Sat, Mar 1, 2025 at 10:26 PM Tom Lane  wrote:
>> This looks reasonably in line with project style ...

> Should I create a commitfest entry for this patch, or is it
> uncontroversial enough and small enough to not warrant that?

The controversy would be more about whether it's worth bothering
with.  In the past we've taken some care to ensure that generated
files would pass pgindent's ideas of correct formatting, but
not more than that.  AFAICT from some quick tests, pgindent has
no opinions about layout of #define lines.

regards, tom lane




Re: Statistics Import and Export

2025-03-01 Thread Tom Lane
Jeff Davis  writes:
> On Sat, 2025-03-01 at 13:52 -0500, Greg Sabino Mullane wrote:
>> Also, anything trained to parse pg_dump output will have to learn
>> about the new SELECT pg_restore_ calls with their multi-line formats
>> (not 100% sure we don't have that anywhere, as things like "SELECT
>> setval" and "SELECT set_config" are single line, but there may be
>> existing things)

> That's an interesting point. What tools are currrently trying to parse
> pg_dump output?

That particular argument needs to be rejected vociferously.  Otherwise
we could never make any change at all in what pg_dump emits.  I think
the standard has to be "if you parse pg_dump output, it's on you to
cope with any legal SQL".

I do grasp Greg's larger point that this is a big change in pg_dump's
behavior and will certainly break some expectations.  I kind of lean
to the position that we'll be sad in the long run if we don't change
the default, though.  What other part of pg_dump's output is not
produced by default?

regards, tom lane




Re: Incorrect result of bitmap heap scan.

2025-03-01 Thread Peter Geoghegan
On Fri, Feb 28, 2025 at 12:53 PM Matthias van de Meent
 wrote:
> Rebased again, now on top of head due to f7a8fc10.

Is everybody in agreement about committing and back patching this fix,
which simply disables the optimization altogether?

I myself don't see a better way, but thought I'd ask before proceeding
with review and commit.

-- 
Peter Geoghegan




Re: Incorrect result of bitmap heap scan.

2025-03-01 Thread Tom Lane
Peter Geoghegan  writes:
> Is everybody in agreement about committing and back patching this fix,
> which simply disables the optimization altogether?
> I myself don't see a better way, but thought I'd ask before proceeding
> with review and commit.

If you don't see a clear path forward, then "disable" is the only
reasonable choice for the back branches.  Maybe we'll find a fix
in future, but it seems unlikely that it'd be back-patchable.

regards, tom lane




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-03-01 Thread Junwang Zhao
On Sat, Mar 1, 2025 at 10:50 AM Sutou Kouhei  wrote:
>
> Hi,
>
> Our 0001/0002 patches were merged into master. I've rebased
> on master. Can we discuss how to proceed rest patches?
>
> The contents of them aren't changed but I'll show a summary
> of them again:
>
> 0001-0003 are for COPY TO and 0004-0007 are for COPY FROM.
>
> For COPY TO:
>
> 0001: Add support for adding custom COPY TO format. This
> uses tablesample like handler approach. We've discussed
> other approaches such as USING+CREATE XXX approach but it
> seems that other approaches are overkill for this case.
>
> See also: 
> https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8
>
> 0002: Export CopyToStateData to implement custom COPY TO
> format as extension.
>
> 0003: Export a function and add a private space to
> CopyToStateData to implement custom COPY TO format as
> extension.
>
> We may want to squash 0002 and 0003 but splitting them will
> be easy to review. Because 0002 just moves existing codes
> (with some rename) and 0003 just adds some codes. If we
> squash 0002 and 0003, moving and adding are mixed.
>
> For COPY FROM:
>
> 0004: This is COPY FROM version of 0001.
>
> 0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for
> enum CopyDest. This is similar change for enum CopySource.
>
> 0006: This is COPY FROM version of 0003.
>
> 0007: This is for easy to implement "ON_ERROR stop" and
> "LOG_VERBOSITY verbose" in extension.
>
> We may want to squash 0005-0007 like for 0002-0003.
>
>
> Thanks,
> --
> kou


While review another thread (Emitting JSON to file using COPY TO),
I found the recently committed patches on this thread pass the
CopyFormatOptions struct directly rather a pointer of the struct
as a function parameter of CopyToGetRoutine and CopyFromGetRoutine.

Then I took a quick look at the newly rebased patch set and
found Sutou has already fixed this issue.

I'm wondering if we should fix it as a separate commit as
it seems like an oversight of previous patches?

-- 
Regards
Junwang Zhao




Re: Emitting JSON to file using COPY TO

2025-03-01 Thread Junwang Zhao
On Mon, Jan 27, 2025 at 4:17 PM jian he  wrote:
>
> hi.
>
> There are two ways we can use to represent the new copy format: json.
> 1.
> typedef struct CopyFormatOptions
> {
> boolbinary;/* binary format? */
> boolfreeze;/* freeze rows on loading? */
> boolcsv_mode;/* Comma Separated Value format? */
> booljson_mode;/* JSON format? */
> ...
> }
>
> 2.
> typedef struct CopyFormatOptions
> {
> CopyFormatformat;/* format of the COPY operation */
> .
> }
>
> typedef enum CopyFormat
> {
> COPY_FORMAT_TEXT = 0,
> COPY_FORMAT_BINARY,
> COPY_FORMAT_CSV,
> COPY_FORMAT_JSON,
> } CopyFormat;
>
> both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
> so the struct size will not influence the performance.
>
> I also did some benchmarks when using CopyFormat.
> the following are the benchmarks info:
> ---
> create unlogged table t as select g from generate_series(1, 1_000_000) g;
>
> build_type=release patch:
> copy t to '/dev/null' json \watch i=0.1 c=10
> last execution Time: 108.741 ms
>
> copy t to '/dev/null' (format text)  \watch i=0.1 c=10
> last execution Time: 42.600 ms
>
> build_type=release master:
> copy t to '/dev/null' (format text)  \watch i=0.1 c=10
> last execution Time Time: 42.948 ms
>
> -
> so a new version is attached, using the struct CopyFormatOptions.
>
> changes mainly in CopyOneRowTo.
> now it is:
> 
> if(!cstate->rel)
> {
> memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
>TupleDescAttr(cstate->queryDesc->tupDesc, 0),
>cstate->queryDesc->tupDesc->natts *
> sizeof(FormData_pg_attribute));
> for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
> populate_compact_attribute(slot->tts_tupleDescriptor, i);
> BlessTupleDesc(slot->tts_tupleDescriptor);
> }
> 
> reasoning for change:
> for composite_to_json to construct json key, we only need
> FormData_pg_attribute.attname
> but code path
> composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
> means we also need to call populate_compact_attribute to populate
> other attributes.
>
> v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
> author is by Joel Jacobson.
> As I mentioned in above,
> replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
> but consolidated 3 bool fields into one enum to make code more lean.
> I think the refactoring (v14-0001) is worth it.

I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
see 2e4127b6d2.

v15-0001 is the merged one of v14-0001 and v14-0002

There are some other ongoing *copy to/from* refactors[1] which we can benefit
to make the code cleaner, especially the checks done in ProcessCopyOptions.

[1]: 
https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com

-- 
Regards
Junwang Zhao


v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch
Description: Binary data


v15-0001-Introduce-json-format-for-COPY-TO.patch
Description: Binary data


Re: Statistics Import and Export

2025-03-01 Thread Corey Huinker
>
> Independently of that, do we want to switch over to storing
> reltuples as a string instead of converting it?  I still feel
> uncomfortable about the amount of baggage we added to pg_dump
> to avoid that.


I'm obviously a 'yes' vote for string, either fixed width buffer or
pg_strdup'd, for the reduced complexity. I'm not dismissing concerns about
memory usage, and we could free the RelStatsInfo structure after use, but
we're already not freeing the parent structures tbinfo or indxinfo,
probably because they're needed right up til the end of the program, and
there's no subsequent consumer for the memory that we'd be freeing up.


lwlocknames.h beautification attempt

2025-03-01 Thread Gurjeet Singh
Currently the contents of lwlocknames.h look like this:

#define ShmemIndexLock (&MainLWLockArray[1].lock)
#define OidGenLock (&MainLWLockArray[2].lock)
#define XidGenLock (&MainLWLockArray[3].lock)
#define ProcArrayLock (&MainLWLockArray[4].lock)
#define SInvalReadLock (&MainLWLockArray[5].lock)
...

This makes it a bit hard to read, since there is no attempt at vertical
alignment along the opening parentheses. It gets worse if the editor
performs syntax highlighting, and the various colored characters in the
definitions appear haphazardly arranged.

Had this been hand-written, I can imagine the author making an attempt at
aligning the definitions, but since this is a generated file, the lack of
that attempt is understandable.

I propose the following change to the generation script,
generate-lwlocknames.pl

-print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n";
+printf $h "#define %-30s %s\n", "${lockname}Lock",
"(&MainLWLockArray[$lockidx].lock)";

which produces the lock names in this format

#define ShmemIndexLock (&MainLWLockArray[1].lock)
#define OidGenLock (&MainLWLockArray[2].lock)
#define XidGenLock (&MainLWLockArray[3].lock)
#define ProcArrayLock  (&MainLWLockArray[4].lock)
#define SInvalReadLock (&MainLWLockArray[5].lock)
...

Yet another format, which I prefer, can be achieved by right-aligning the
lock names. In addition to aligning the definitions, it also aligns the
'Lock' suffix in all the names. But as they say beauty is in the eye of the
beholder, so this one may be actively disliked by others.

-print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n";
+printf $h "#define %30s %s\n", "${lockname}Lock",
"(&MainLWLockArray[$lockidx].lock)";

#define ShmemIndexLock (&MainLWLockArray[1].lock)
#define OidGenLock (&MainLWLockArray[2].lock)
#define XidGenLock (&MainLWLockArray[3].lock)
#define  ProcArrayLock (&MainLWLockArray[4].lock)
#define SInvalReadLock (&MainLWLockArray[5].lock)
...

The lockname column needs to be at least 30 chars wide to accommodate the
longest of the lock names 'DynamicSharedMemoryControlLock'.

Best regards,
Gurjeet
http://Gurje.et


Re: lwlocknames.h beautification attempt

2025-03-01 Thread Tom Lane
Gurjeet Singh  writes:
> I propose the following change to the generation script,
> generate-lwlocknames.pl
> ...
> which produces the lock names in this format

> #define ShmemIndexLock (&MainLWLockArray[1].lock)
> #define OidGenLock (&MainLWLockArray[2].lock)
> #define XidGenLock (&MainLWLockArray[3].lock)
> #define ProcArrayLock  (&MainLWLockArray[4].lock)
> #define SInvalReadLock (&MainLWLockArray[5].lock)

This looks reasonably in line with project style ...

> Yet another format, which I prefer, can be achieved by right-aligning the
> lock names.

> #define ShmemIndexLock (&MainLWLockArray[1].lock)
> #define OidGenLock (&MainLWLockArray[2].lock)
> #define XidGenLock (&MainLWLockArray[3].lock)
> #define  ProcArrayLock (&MainLWLockArray[4].lock)
> #define SInvalReadLock (&MainLWLockArray[5].lock)

... but that doesn't.  I challenge you to provide even one example
of that layout in our source tree, or to explain why it's better.

regards, tom lane




Re: lwlocknames.h beautification attempt

2025-03-01 Thread Gurjeet Singh
On Sat, Mar 1, 2025 at 10:26 PM Tom Lane  wrote:
>
> Gurjeet Singh  writes:
> > I propose the following change to the generation script,
> > generate-lwlocknames.pl
> > ...
> > which produces the lock names in this format
>
> > #define ShmemIndexLock (&MainLWLockArray[1].lock)
> > #define OidGenLock (&MainLWLockArray[2].lock)
> > #define XidGenLock (&MainLWLockArray[3].lock)
> > #define ProcArrayLock  (&MainLWLockArray[4].lock)
> > #define SInvalReadLock (&MainLWLockArray[5].lock)
>
> This looks reasonably in line with project style ...

Should I create a commitfest entry for this patch, or is it
uncontroversial enough and small enough to not warrant that?

> > Yet another format, which I prefer, can be achieved by right-aligning the
> > lock names.
>
> > #define ShmemIndexLock (&MainLWLockArray[1].lock)
> > #define OidGenLock (&MainLWLockArray[2].lock)
> > #define XidGenLock (&MainLWLockArray[3].lock)
> > #define  ProcArrayLock (&MainLWLockArray[4].lock)
> > #define SInvalReadLock (&MainLWLockArray[5].lock)
>
> ... but that doesn't.  I challenge you to provide even one example
> of that layout in our source tree, or to explain why it's better.

I haven't seen this style in any other project, let alone in Postgres. I
just mentioned it since I personally liked it slightly better after
trying a couple of different styles, because of the aligned suffix in
the lock names.



Best regards,
Gurjeet
http://Gurje.et




Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-03-01 Thread Peter Geoghegan
On Thu, Feb 27, 2025 at 7:57 PM Peter Geoghegan  wrote:
> On Thu, Feb 27, 2025 at 3:42 PM Robert Haas  wrote:
> > Good documentation could help.

Attached revision adds an example that shows how "Index Searches: N"
can vary. This appears in "14.1.2. EXPLAIN ANALYZE".

Other changes in this revision:

* Improved commit message.

* We now consistently show "Index Searches: N" after all other scan
related output, so that it will reliably appear immediately before the
"Buffers: " line.

This seemed slightly better, since it is often useful to consider
these two numbers together.

My current plan is to commit this patch on Wednesday or Thursday,
barring any objections.

Thanks
--
Peter Geoghegan


0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
Description: Binary data


Re: access numeric data in module

2025-03-01 Thread Jelte Fennema-Nio
On Sat, 1 Mar 2025 at 17:33, Tom Lane  wrote:
> But IMO you still haven't made an acceptable case
> for deciding that these data structures aren't private to numeric.c.
> What behaviors do you actually need that aren't accessible via the
> existing exported functons?

FWIW in pg_duckdb we would definitely have liked to have access to
some of the currently unexposed numeric internals. We vendored in some
of the definitions that we required ourselves[1], but it would be very
nice if we didn't have to. I cannot speak for Ed's reasoning, but for
pg_duckdb the reason we cannot use the many of the already exposed
functions are not thread-safe. So we create our own thread-safe way of
constructing these functions. Another reason is so we can construct
numeric types from int128/uint128 types efficiently. I would be very
happy if we'd have these definitions available, even if they would
change in a future PG version (that's what you sign up for as an
extension author). And indeed as Robert already said, numeric seems to
be the only type that has this kind of restriction on viewing the
internal representation.

[1]: 
https://github.com/duckdb/pg_duckdb/blob/main/include/pgduckdb/vendor/pg_numeric_c.hpp




Re: access numeric data in module

2025-03-01 Thread Ed Behn
Tom-
I understand that you are concerned about future maintenance costs vs
benefits of this change. I hope that I can address those concerns. An
important thing to note is that there are two different structures that
represent numeric values:
* NumericData is an opaque structure that is defined in numeric.c. It
is this struct that is used to store values. The patch I submitted has this
structure remain opaque and in numeric.c. Its internals are messy and
subject to future changes. I agree that third parties should not have
access to this. Of note is that the type Numeric is a typedef of
NumericData*.
* NumericVar is a user-friendly structure that already exists. It is
this structure that I propose moving to numeric.h. There are functions that
exist to convert it to and from NumericData. It is these functions that I
propose giving access to.

What the patch boils down to is the movement of NumericVar to numeric.h
along with function declarations for the basic function to work with it and
a few pre-processor declarations.

I agree that there is the potential for future maintenance costs here.
However, any future changes to NumericData would necessitate updating the
code to convert to and from NumericVar regardless of the proposed changes.
I think that this small increase in costs is outweighed by the benefits of
allowing third parties to access this powerful datatype.

As for the reason that I would like to make this change: I am the
maintainer of the PL/Haskell extension. (It allows the use of Haskell code
as a procedural language. https://github.com/ed-o-saurus/PLHaskell) In the
extension, users can currently pass several Postgres types and also have
the function return them. This is accomplished by using the functions and
macros that convert between Datums and C data types. (For example
DatumGetFloat8 and Float8GetDatum to handle double-precision floating point
values) I would like to add support for the use of the numeric type to the
extension. To this end, I would need to create a Haskell type that mirrors
the Postgres numeric type. Passed Haskell values would be instantiated by
reading the data from Postgres. Conversely, returned values would be
converted to the Postgres type. Internally, users would be able to perform
mathematical operations with the Haskell values like any other type.
Currently, there is no way for a third-party extension to get needed
information about numeric values or build new numeric values. The proposed
changes would remedy this.

An alternative approach would be to make available calls to read and
create numeric data. However, as the NumericVar struct already exists, I
feel that utilizing it is the more natural approach.

What do you think?
  -Ed


On Sat, Mar 1, 2025 at 3:32 PM Ed Behn  wrote:

> Tom-
> I think that I can allay your concerns. Please give me a day or so to
> put together my more complete thoughts on the matter. I'll be in touch.
>
> -Ed
>
> On Sat, Mar 1, 2025 at 11:33 AM Tom Lane  wrote:
>
>> Ed Behn  writes:
>> >> There is actually no new code. Code is simply moved from numeric.c to
>> >> numeric.h.
>>
>> I will absolutely not hold still for that.  It would mean that any
>> time we want to think about messing with the contents of numerics,
>> we need to examine more or less the whole Postgres code base to see
>> what else is poking into those structures.
>>
>> If we must do something like this, then a separate header
>> "numeric_internal.h" or something like that would reduce the blast
>> radius for changes.  But IMO you still haven't made an acceptable case
>> for deciding that these data structures aren't private to numeric.c.
>> What behaviors do you actually need that aren't accessible via the
>> existing exported functons?
>>
>> regards, tom lane
>>
>


Re: bug when apply fast default mechanism for adding new column over domain with default value

2025-03-01 Thread Tom Lane
jian he  writes:
> So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef,
> pg_attrdef related code.
> and it works fine.

I think the fundamental problem here is that StoreAttrDefault
shouldn't be involved in this in the first place.  It looks like
somebody did that in the hopes of avoiding two updates of the
new pg_attribute row (one to set atthasdef and the other to fill
attmissingval), but it's just bad code structure.  We should
take that code out of StoreAttrDefault, not duplicate it, because
the assumption that the missingval is identical to what goes into
pg_attrdef is just wrong.

(We could possibly get back down to one update by moving code
in ATExecAddColumn so that we know before calling
InsertPgAttributeTuples whether the column will have a non-null
default, and then we could set atthasdef correctly in the
originally-inserted tuple.  That's an optimization though,
not part of the basic bug fix.)

Also, since the "if (RELKIND_HAS_STORAGE(relkind))" stanza
in ATExecAddColumn is already calling expression_planner,
we should be able to avoid doing that twice on the way to
discovering whether the expression is a constant.

I kind of feel that StoreAttrMissingVal belongs in heap.c;
it's got about nothing to do with pg_attrdef.

regards, tom lane




Re: Space missing from EXPLAIN output

2025-03-01 Thread Robert Haas
On Fri, Feb 28, 2025 at 12:12 PM Thom Brown  wrote:
> Rebased and attached.

Thanks, committed. Sorry for the mistake and thanks for the patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: SIMD optimization for list_sort

2025-03-01 Thread Shankaran, Akash
>> > I don't think "another extension might use it someday" makes a very strong 
>> > case,
>> > particularly for something that requires a new dependency.
>>
>> The x86-simdsort library is an optional dependency in Postgres. Also the new 
>> list sort implementation which uses the x86-simdsort library does not impact 
>> any of the existing workflows in Postgres.

>"Optional" and "Does not impact" are not great selling points to get
>us to take a 1500 line patch. As we told you in November, list_sort
>isn't critical for us. You need to start with the user and work
>backwards to the technology. Don't pick a technology and try to sell
>people on using it.

Agree on starting from the user problem and work towards technology. As stated 
upthread, the problem being addressed here is optimizing pg_vector list_sort 
(which relies on postgres list_sort) to speed up HNSW index construction. The 
results show 7-10% improvements on vector workloads using the library. 

Given the same x86-simdsort library is intended to optimize 1) list_sort 2) 
tuple sort, it didn't make sense to duplicate the work to integrate it in 
pg_vector for list_sort, and then again in postgres for tuple-sort.
We're trying to tackle list_sort and tuple_sort in separate patches using the 
same x86-simdsort library, with the goal to optimize both. Let us know if this 
approach is preferred and if the list_sort patch could be reviewed and any 
other tests we should do, or would you rather see tuple_sort benchmarks. 

- Akash Shankaran


Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-03-01 Thread Alena Rybakina

On 28.02.2025 14:48, Alena Rybakina wrote:


Hi!

On 21.02.2025 00:09, Alena Rybakina wrote:


Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed 
too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), 
(1)));


I added it and attached a patch with diff file. To be honest, I didn't 
find queries except for var with volatile functions where the 
transform can't be applied.


I removed the function volatility check that I added in the previous 
version, since we already check it in is_simple_values_sequence.


I'm not sure about only cases where var can refer to something outside 
available_rels list but I couldn't come up with an example where 
that's possible, what do you think?


Considering it again, I think we can't face problems like that because 
we don't work with join.


I attached a diff file as a difference with the 3rd version of the 
patch, when we did not consider the values with var for transformation.


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index 08c4648f936..7c1b2f4cfe6 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1215,8 +1215,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context 
*context)
 }
 
 /*
- * The function traverses the tree looking for elements of type var.
- * If it finds it, it returns true.
+ * The function traverses the tree looking for subqueries particularly
+ * elements of type RangeTblEntry. If it finds it, it returns true.
  */
 static bool
 values_simplicity_check_walker(Node *node, void *ctx)
@@ -1225,7 +1225,7 @@ values_simplicity_check_walker(Node *node, void *ctx)
{
return false;
}
-   else if(IsA(node, Var))
+   else if(IsA(node, RangeTblEntry))
return true;
else if(IsA(node, Query))
return query_tree_walker((Query *) node,
@@ -1304,6 +1304,16 @@ convert_VALUES_to_ANY(Query *query, Node *testexpr)
 
value = eval_const_expressions(NULL, value);
 
+   if(IsA(value, Var))
+   {
+   /*
+* Upper-level vars will now be one level closer to 
their
+* parent than before; in particular, anything that had 
been level 1
+* becomes level zero.
+   */
+   IncrementVarSublevelsUp(value, -1, 1);
+   }
+
if (!IsA(value, Const))
have_param = true;
else if (((Const *)value)->constisnull)
diff --git a/src/test/regress/expected/subselect.out 
b/src/test/regress/expected/subselect.out
index 5138a00349c..63e0114f177 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -3095,3 +3095,14 @@ SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3));
  ->  Values Scan on "*VALUES*"
 (5 rows)
 
+EXPLAIN (COSTS OFF)
+SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values 
(t1.ten), (1)));
+QUERY PLAN
+--
+ Nested Loop
+   Join Filter: (t2.ten = ANY (ARRAY[t1.ten, 1]))
+   ->  Seq Scan on onek t1
+   ->  Materialize
+ ->  Seq Scan on onek t2
+(5 rows)
+
diff --git a/src/test/regress/sql/subselect.sql 
b/src/test/regress/sql/subselect.sql
index 480f8d7b852..088619f0ffc 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1357,3 +1357,6 @@ SELECT ten FROM onek t WHERE 1.0 IN ((VALUES (1), 
(3))::integer);
 
 EXPLAIN (COSTS OFF)
 SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3));
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values 
(t1.ten), (1)));
From d000645cb3da4c932bccb98ab39be890aa96383f Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Sat, 1 Mar 2025 14:28:58 +0300
Subject: [PATCH 2/2] Add an implementation of the x IN VALUES to 'x = ANY 
 [...]', a special case of the ScalarArrayOpExpr operation.

It will simplify the query tree, eliminating the appearance of
an unnecessary join.

Since VALUES describes a relational table, and the value of such
a list is a table row, the optimizer will likely face an underestimation
problem due to the inability to estimate cardinality through
MCV statistics. The cardinality evaluation mechanism
can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform
a statistical evaluation element by element.

We perform the transformation in the convert_ANY_sublink_to_join
if VALUES RTE is proper and the transformation is convertible.
The conversion is only possible for operations on scalar va

Re: [PATCH] Add get_bytes() and set_bytes() functions

2025-03-01 Thread Dean Rasheed
On Fri, 24 Jan 2025 at 13:00, Aleksander Alekseev
 wrote:
>
> Thank you. Here is the corrected patch.
>

This looks pretty good to me. I have a just a couple of minor comments:

* The loop in bytea_integer() can be written more simply as a "for"
loop. Given that it's only a few lines of code, it might as well just
be coded directly in each cast function, which avoids the need to go
via a 64-bit integer for every case. In addition, it should use the
BITS_PER_BYTE macro rather than "8". Doing that leads to code that's
consistent with bittoint4() and bittoint8().

* In pg_proc.dat, it's not necessary to write "proleakproof => 'f'",
because that's the default, and no other function does that.

* I think it's worth using slightly more non-trivial doc examples,
including positive and negative cases, to make the behaviour more
obvious.

* I also tweaked the regression tests a bit, and copied the existing
test style which displays both the expected and actual results from
each test.

With those updates, I think this is ready for commit, which I'll do in
a day or two, if there are no further comments.

Regards,
Dean
From 5c0572a0930067f7244606384542d670cd1e4aa6 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Sat, 1 Mar 2025 10:49:24 +
Subject: [PATCH v10] Allow casting between bytea and integer types.

This allows smallint, integer, and bigint values to be cast to and
from bytea. The bytea value is the two's complement representation of
the integer, with the most significant byte first. For example:

  1234::bytea -> \x04d2
  (-1234)::bytea -> \xfb2e

Author: Aleksander Alekseev 
Reviewed-by: Joel Jacobson 
Reviewed-by: Yugo Nagata 
Reviewed-by: Peter Eisentraut 
Reviewed-by: Michael Paquier 
Reviewed-by: Dean Rasheed 
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
---
 doc/src/sgml/func.sgml   |  17 
 src/backend/utils/adt/varlena.c  |  90 
 src/include/catalog/pg_cast.dat  |  14 
 src/include/catalog/pg_proc.dat  |  19 +
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/strings.out| 102 +++
 src/test/regress/sql/strings.sql |  29 +++
 7 files changed, 274 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0e6c534965..6b15f167d0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5035,6 +5035,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   In addition, it is possible to cast integral values to and from type
+   bytea. Casting an integer to bytea produces
+   2, 4, or 8 bytes, depending on the width of the integer type. The result
+   is the two's complement representation of the integer, with the most
+   significant byte first. Some examples:
+
+1234::smallint::bytea  \x04d2
+cast(1234 as bytea)\x04d2
+cast(-1234 as bytea)   \xfb2e
+'\x8000'::bytea::smallint  -32768
+'\x8000'::bytea::integer   32768
+
+   Casting a bytea to an integer will raise an error if the
+   length of the bytea exceeds the width of the integer type.
+  
+
   
See also the aggregate function string_agg in
 and the large object functions
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index e455657170..d22648a7e4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4057,6 +4057,96 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint16		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("smallint out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT16(result);
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint32		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("integer out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT32(result);
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint64		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("bigint out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT64(result);
+}
+
+/* Cast int2 -> byte

Re: SQLFunctionCache and generic plans

2025-03-01 Thread Pavel Stehule
Hi

pá 28. 2. 2025 v 7:29 odesílatel Pavel Stehule 
napsal:

> Hi
>
> čt 27. 2. 2025 v 21:45 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:
>> >> So taken together, our results are all over the map, anywhere
>> >> from 7% speedup to 7% slowdown.  My usual rule of thumb is that
>>
>> > Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
>>
>> Alexander got that on the fx4 case, according to his response a
>> few messages ago [1].  It'd be good if someone else could reproduce
>> that, because right now we have two "it's slower" results versus
>> only one "it's faster".
>>
>
> ok
>
> here is a profile from master
>
> 6.98%  postgres  postgres   [.] hash_bytes
>6.30%  postgres  postgres   [.] palloc0
>3.57%  postgres  postgres   [.] SearchCatCacheInternal
>3.29%  postgres  postgres   [.] AllocSetAlloc
>2.65%  postgres  plpgsql.so [.] exec_stmts
>2.55%  postgres  postgres   [.] expression_tree_walker_impl
>2.34%  postgres  postgres   [.] _SPI_execute_plan
>2.13%  postgres  postgres   [.] CheckExprStillValid
>2.02%  postgres  postgres   [.] fmgr_info_cxt_security
>1.89%  postgres  postgres   [.] ExecInitFunc
>1.51%  postgres  postgres   [.] ExecInterpExpr
>1.48%  postgres  postgres   [.] ResourceOwnerForget
>1.44%  postgres  postgres   [.] AllocSetReset
>1.35%  postgres  postgres   [.] MemoryContextCreate
>1.30%  postgres  plpgsql.so [.] plpgsql_exec_function
>1.29%  postgres  libc.so.6  [.] __memcmp_sse2
>1.24%  postgres  postgres   [.] MemoryContextDelete
>1.13%  postgres  postgres   [.] check_stack_depth
>1.11%  postgres  postgres   [.] AllocSetContextCreateInternal
>1.09%  postgres  postgres   [.] resolve_polymorphic_argtypes
>1.08%  postgres  postgres   [.] hash_search_with_hash_value
>1.07%  postgres  postgres   [.] standard_ExecutorStart
>1.07%  postgres  postgres   [.] ExprEvalPushStep
>1.04%  postgres  postgres   [.] ExecInitExprRec
>0.95%  postgres  plpgsql.so [.] plpgsql_estate_setup
>0.91%  postgres  postgres   [.] ExecReadyInterpretedExp
>
> and from patched
>
>7.08%  postgres  postgres   [.] hash_bytes
>6.25%  postgres  postgres   [.] palloc0
>3.52%  postgres  postgres   [.] SearchCatCacheInternal
>3.30%  postgres  postgres   [.] AllocSetAlloc
>2.39%  postgres  postgres   [.] expression_tree_walker_impl
>2.37%  postgres  plpgsql.so [.] exec_stmts
>2.15%  postgres  postgres   [.] _SPI_execute_plan
>2.10%  postgres  postgres   [.] CheckExprStillValid
>1.94%  postgres  postgres   [.] fmgr_info_cxt_security
>1.71%  postgres  postgres   [.] ExecInitFunc
>1.41%  postgres  postgres   [.] AllocSetReset
>1.40%  postgres  postgres   [.] ExecInterpExpr
>1.38%  postgres  postgres   [.] ExprEvalPushStep
>1.34%  postgres  postgres   [.] ResourceOwnerForget
>1.31%  postgres  postgres   [.] MemoryContextDelete
>1.24%  postgres  libc.so.6  [.] __memcmp_sse2
>1.21%  postgres  postgres   [.] MemoryContextCreate
>1.18%  postgres  postgres   [.] AllocSetContextCreateInternal
>1.17%  postgres  postgres   [.] hash_search_with_hash_value
>1.13%  postgres  postgres   [.] resolve_polymorphic_argtypes
>1.13%  postgres  plpgsql.so [.] plpgsql_exec_function
>1.03%  postgres  postgres   [.] standard_ExecutorStart
>0.98%  postgres  postgres   [.] ExecInitExprRec
>0.96%  postgres  postgres   [.] check_stack_depth
>
> looks so there is only one significant differences
>
> ExprEvalPushStep 1.07 x 1.38%
>
> Regards
>
> Pavel
>
> compiled without assertions on gcc 15 with 02
>
> vendor_id : GenuineIntel
> cpu family : 6
> model : 42
> model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
> stepping : 7
> microcode : 0x2f
> cpu MHz : 2691.102
> cache size : 6144 KB
>
>
I tested the patches on another notebook with more recent cpu

vendor_id : GenuineIntel
cpu family : 6
model : 154
model name : 12th Gen Intel(R) Core(TM) i7-12700H
stepping : 3
microcode : 0x436
cpu MHz : 400.000
cache size : 24576 KB

And the difference are smaller - about 3%

Regards

Pavel


>
>
>
>> regards, tom lane
>>
>> [1]
>> https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
>>
>


Re: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-03-01 Thread Ryo Kanbayashi
On Fri, Feb 28, 2025 at 11:27 PM Fujii Masao
 wrote:
> On 2025/02/28 9:24, Ryo Kanbayashi wrote:
> > I have rewrote my patch on TAP test sttyle :)
> > File for build are also updated.
>
> Thanks for updating the patch!

Thanks for review:)

>
> +'tests': [
> +  't/001_ecpg_notice.pl',
> +  't/002_ecpg_notice_informix.pl',
>
> Since neither test emits "notice" messages, shouldn't the test file
> names be revised to reflect this?

I replaced "notice" to "err_warn_msg"

> Also, I'm unsure if it's ideal to place input files directly under
> the "t" directory. I looked for similar TAP tests with input files,
> but I couldn't find any examples to guide this decision...

I couldn't too. So places are not changed.

> +program_help_ok('ecpg');
> +program_version_ok('ecpg');
> +program_options_handling_ok('ecpg');
> +command_fails(['ecpg'], 'ecpg without arguments fails');
>
> These checks seem unnecessary in 002 since they're already covered in 001.

I reflected above.

---
Great regards,
Ryo Kanbayashi


ecpg-notice-regress-patch-tap-ver-rebased.patch
Description: Binary data


Re: RFC: Additional Directory for Extensions

2025-03-01 Thread Gabriele Bartolini
Hi everyone,

I have finally been able to test the patch in a Kubernetes environment with
CloudNativePG, thanks to Niccolò Fei and Marco Nenciarini, who created a
pilot patch for CloudNativePG (
https://github.com/cloudnative-pg/cloudnative-pg/pull/6546).

In the meantime, Kubernetes is likely adding the ImageVolume feature
starting from the upcoming version 1.33. I will write a blog post soon
about how CloudNativePG will benefit from this feature. See
https://github.com/kubernetes/enhancements/issues/4639.

Although the steps are not easily reproducible by everyone, I can confirm
that I successfully mounted a volume in the Postgres pod using a container
image that includes only pgvector (with a size of 1.6MB - see
https://github.com/EnterpriseDB/pgvector/blob/dev/5645/Dockerfile.cnpg).

By setting:

dynamic_library_path = '$libdir:/extensions/pgvector/lib'
extension_control_path = '$system:/extensions/pgvector/share'

I was able to run the following queries:

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector';
-[ RECORD 1 ]-+-
name  | vector
default_version   | 0.8.0
installed_version |
comment   | vector data type and ivfflat and hnsw access methods

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector';
-[ RECORD 1 ]-+-
name  | vector
default_version   | 0.8.0
installed_version |
comment   | vector data type and ivfflat and hnsw access methods

I also successfully ran the following:

postgres=# SELECT * FROM pg_extension_update_paths('vector');

By emptying the content of `extension_control_path`, the vector extension
disappeared from the list.

postgres=# SHOW extension_control_path ;
 extension_control_path

 $system
(1 row)

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'vector';
 name | default_version | installed_version | comment
--+-+---+-
(0 rows)

postgres=# SELECT * FROM pg_available_extension_versions WHERE name =
'vector';
 name | version | installed | superuser | trusted | relocatable | schema |
requires | comment
--+-+---+---+-+-++--+-
(0 rows)

In my opinion, the patch already helps a lot and does what I can reasonably
expect from a first iteration of improvements in enabling immutable
container images that ship a self-contained extension to be dynamically
loaded and unloaded from a Postgres cluster in Kubernetes. From here, it is
all about learning how to improve things with an exploratory mindset with
future versions of Postgres and Kubernetes. It's a long journey but this is
a fundamental step in the right direction.

Let me know if there's more testing for me to do. The documentation looks
clear to me.

Thank you to everyone who contributed to this patch, from the initial
discussions to the development phase. I sincerely hope this is included in
Postgres 18.

Ciao,
Gabriele

On Fri, 28 Feb 2025 at 16:36, Matheus Alcantara 
wrote:

> Hi
>
> On Tue, Feb 25, 2025 at 5:29 PM Matheus Alcantara
>  wrote:
> > Fixed on the attached v3.
> >
> After I've sent the v3 patch I noticed that the tests were failing on
> windows.
> The problem was on TAP test that was using ":" as a separator on
> extension_control_path and also the path was not being escaped correctly
> resulting in a wrong path configuration.
>
> Attached v4 with all these fixes.
>
> --
> Matheus Alcantara
>


-- 
Gabriele Bartolini
VP, Chief Architect, Kubernetes
enterprisedb.com


Re: Separate GUC for replication origins

2025-03-01 Thread Amit Kapila
On Thu, Feb 13, 2025 at 6:48 AM Masahiko Sawada  wrote:
>
> Thank you for updating the patch! The patch looks mostly good to me.
>

+ /*
+ * Prior to PostgreSQL 18, max_replication_slots was used to set the
+ * number of replication origins. For backward compatibility, -1 indicates
+ * to use the fallback value (max_replication_slots).
+ */
+ if (max_replication_origin_sessions == -1)

Shouldn't we let users use max_replication_origin_sessions for
subscribers? Maintaining this mapping for a long time can create
confusion such that some users will keep using max_replication_slots
for origins, and others will start using
max_replication_origin_sessions. Such a mapping would be useful if we
were doing something like this in back-branches, but doing it in a
major version appears to be more of a maintenance burden.

-- 
With Regards,
Amit Kapila.




Re: DOCS - Generated Column Replication Examples

2025-03-01 Thread Amit Kapila
On Mon, Feb 3, 2025 at 4:23 AM Peter Smith  wrote:
>
> A recent commit [1] added a new section "29.6. Generated Column
> Replication" to the documentation [2]. But, no "Examples" were
> included.
>
> I've created this new thread to make a case for adding an "Examples"
> subsection to that page.
>
> (the proposed examples patch is attached)
>
> ==
>
> 1. Reasons AGAINST Adding Examples to this section (and why I disagree):
>
> 1a. The necessary information to describe the feature can already be
> found in the text.
>

I still feel that the current text and example given on the page are
sufficient for now. This is a new feature, and I think it is better to
get some feedback from users. We can add more examples or information
based on real user feedback. Also, we haven't got any other response
on this thread yet. So, I think we can wait for a week or so more to
see if anyone else also thinks that the additional examples will be
useful; if not, then return this patch for now.

-- 
With Regards,
Amit Kapila.




Re: access numeric data in module

2025-03-01 Thread Ed Behn
Upon further review, I've updated the patch. This avoids possible name
conflicts with other functions. See attached.

On Sat, Feb 8, 2025 at 3:24 PM Ed Behn  wrote:

> I've created a patch (attached) to implement the changes discussed below.
>
> This change moves the definition of the NumericVar structure to numeric.h.
> Function definitions for functions used to work with NumericVar are also
> moved to the header as are definitions of functions used to convert
> NumericVar to Numeric. (Numeric is used to store numeric and decimal types.)
>
> All of this is so that third-party libraries can access numeric and
> decimal values without having to access the opaque Numeric structure.
>
> There is actually no new code. Code is simply moved from numeric.c to
> numeric.h.
>
> This is a patch against branch master.
>
> This successfully compiles and is tested with regression tests.
>
> Also attached is a code sample that uses the change.
>
> Please provide feedback. I'm planning to submit this for the March
> commitfest.
>
>-Ed
>
> On Wed, Sep 18, 2024 at 9:50 AM Robert Haas  wrote:
>
>> On Sat, Sep 14, 2024 at 2:10 PM Ed Behn  wrote:
>> > Was there a resolution of this? I'm wondering if it is worth it for
>> me to submit a PR for the next commitfest.
>>
>> Well, it seems like what you want is different than what I want, and
>> what Tom wants is different from both of us. I'd like there to be a
>> way forward here but at the moment I'm not quite sure what it is.
>>
>> --
>> Robert Haas
>> EDB: http://www.enterprisedb.com
>>
>
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 40dcbc7b671..ec5e4b5ee32 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -45,65 +45,11 @@
 
 /* --
  * Uncomment the following to enable compilation of dump_numeric()
- * and dump_var() and to get a dump of any result produced by make_result().
+ * and dump_var() and to get a dump of any result produced by numeric_make_result().
  * --
 #define NUMERIC_DEBUG
  */
 
-
-/* --
- * Local data types
- *
- * Numeric values are represented in a base-NBASE floating point format.
- * Each "digit" ranges from 0 to NBASE-1.  The type NumericDigit is signed
- * and wide enough to store a digit.  We assume that NBASE*NBASE can fit in
- * an int.  Although the purely calculational routines could handle any even
- * NBASE that's less than sqrt(INT_MAX), in practice we are only interested
- * in NBASE a power of ten, so that I/O conversions and decimal rounding
- * are easy.  Also, it's actually more efficient if NBASE is rather less than
- * sqrt(INT_MAX), so that there is "headroom" for mul_var and div_var to
- * postpone processing carries.
- *
- * Values of NBASE other than 1 are considered of historical interest only
- * and are no longer supported in any sense; no mechanism exists for the client
- * to discover the base, so every client supporting binary mode expects the
- * base-1 format.  If you plan to change this, also note the numeric
- * abbreviation code, which assumes NBASE=1.
- * --
- */
-
-#if 0
-#define NBASE		10
-#define HALF_NBASE	5
-#define DEC_DIGITS	1			/* decimal digits per NBASE digit */
-#define MUL_GUARD_DIGITS	4	/* these are measured in NBASE digits */
-#define DIV_GUARD_DIGITS	8
-
-typedef signed char NumericDigit;
-#endif
-
-#if 0
-#define NBASE		100
-#define HALF_NBASE	50
-#define DEC_DIGITS	2			/* decimal digits per NBASE digit */
-#define MUL_GUARD_DIGITS	3	/* these are measured in NBASE digits */
-#define DIV_GUARD_DIGITS	6
-
-typedef signed char NumericDigit;
-#endif
-
-#if 1
-#define NBASE		1
-#define HALF_NBASE	5000
-#define DEC_DIGITS	4			/* decimal digits per NBASE digit */
-#define MUL_GUARD_DIGITS	2	/* these are measured in NBASE digits */
-#define DIV_GUARD_DIGITS	4
-
-typedef int16 NumericDigit;
-#endif
-
-#define NBASE_SQR	(NBASE * NBASE)
-
 /*
  * The Numeric type as stored on disk.
  *
@@ -252,75 +198,6 @@ struct NumericData
 	 | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
 	: ((n)->choice.n_long.n_weight))
 
-/*
- * Maximum weight of a stored Numeric value (based on the use of int16 for the
- * weight in NumericLong).  Note that intermediate values held in NumericVar
- * and NumericSumAccum variables may have much larger weights.
- */
-#define NUMERIC_WEIGHT_MAX			PG_INT16_MAX
-
-/* --
- * NumericVar is the format we use for arithmetic.  The digit-array part
- * is the same as the NumericData storage format, but the header is more
- * complex.
- *
- * The value represented by a NumericVar is determined by the sign, weight,
- * ndigits, and digits[] array.  If it is a "special" value (NaN or Inf)
- * then only the sign field matters; ndigits should be zero, and the weight
- * and dscale fields are ignored.
- *
- * Note: the first digit of a NumericVar's value is assumed to be multiplied
- * by NBASE ** weight.  Another way to say it is that there are weig

Re: Amcheck verification of GiST and GIN

2025-03-01 Thread Kirill Reshke
On Fri, 28 Feb 2025 at 23:31, Mark Dilger  wrote:

> The only obvious definition of "wrong" for this is that gin index scans 
> return different result sets than table scans over the same data.  Using your 
> much smaller reproducible test case, and adding rows like:

Yeach, you are 100% right. Actually, along this thread, we have not
spotted any GIN bugs yet, only GIN amcheck bugs.

This turns out to be also an GIN amcheck bug:

```
DEBUG:  comparing for offset 79  category 2  key attnum 1
DEBUG:  comparing for offset 80  category 3  key attnum 1
DEBUG:  comparing for offset 81  category 0  key attnum 2
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 81, rightlink 4294967295
DEBUG:  comparing for offset 82  category 0  key attnum 2

DEBUG:  comparing for offset 100  category 0  key attnum 2
DEBUG:  comparing for offset 101  category 2  key attnum 2
DEBUG:  comparing for offset 102  category 3  key attnum 2
DEBUG:  comparing for offset 103  category 0  key attnum 3
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 103, rightlink 4294967295
DEBUG:  comparing for offset 104  category 0  key attnum 3
DEBUG:  comparing for offset 105  category 0  key attnum 3
```
Turns out we compare page entries for different attributes in
gin_check_parent_keys_consistency.

Trivial fix attached (see v37-0004). I now simply compare current and
prev attribute numbers. This revolves issue discovered by
`v0-0001-Add-a-reproducible-test-case-for-verify_gin-error.patch.no_apply`.
However, the stress test seems to still not pass. On my pc, it never
ens, all processes are in
DELETE waiting/UPDATE waiting state. I will take another look tomorrow.



p.s. I am just about to send this message, while i discovered we now
miss v34-0003-Add-gist_index_check-function-to-verify-GiST-ind.patch &
v34-0005-Add-GiST-support-to-pg_amcheck.patch from this patch series
;(

-- 
Best regards,
Kirill Reshke


v37-0004-Fix-for-gin_index_check.patch
Description: Binary data


v37-0003-Fix-wording-in-GIN-README.patch
Description: Binary data


v37-0002-Add-gin_index_check-to-verify-GIN-index.patch
Description: Binary data


v37-0001-Refactor-amcheck-internals-to-isolate-common-loc.patch
Description: Binary data


v37-0005-Add-gin-index-checking-test-for-jsonb-data.patch
Description: Binary data


v37-0006-Add-gin-to-the-create-index-concurrently-tap-tes.patch
Description: Binary data


v37-0007-Stress-test-verify_gin-using-pgbench.patch
Description: Binary data


Re: access numeric data in module

2025-03-01 Thread Tom Lane
Ed Behn  writes:
>> There is actually no new code. Code is simply moved from numeric.c to
>> numeric.h.

I will absolutely not hold still for that.  It would mean that any
time we want to think about messing with the contents of numerics,
we need to examine more or less the whole Postgres code base to see
what else is poking into those structures.

If we must do something like this, then a separate header
"numeric_internal.h" or something like that would reduce the blast
radius for changes.  But IMO you still haven't made an acceptable case
for deciding that these data structures aren't private to numeric.c.
What behaviors do you actually need that aren't accessible via the
existing exported functons?

regards, tom lane




Re: Statistics Import and Export

2025-03-01 Thread Alexander Lakhin

Hello Jeff,

26.02.2025 04:00, Jeff Davis wrote:

I plan to commit the patches soon.


It looks like 8f427187d broke pg_dump on Cygwin:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07

As far as I can see, it exits prematurely here:
   float   reltuples = strtof(PQgetvalue(res, i, 
i_reltuples), NULL);

because of:
/*
 * Cygwin has a strtof() which is literally just (float)strtod(), which means
 * we get misrounding _and_ silent over/underflow. Using our wrapper doesn't
 * fix the misrounding but does fix the error checks, which cuts down on the
 * number of test variant files needed.
 */
#define HAVE_BUGGY_STRTOF 1
...
#ifdef HAVE_BUGGY_STRTOF
extern float pg_strtof(const char *nptr, char **endptr);
#define strtof(a,b) (pg_strtof((a),(b)))
#endif

and:
float
pg_strtof(const char *nptr, char **endptr)
{
...
    if (errno)
    {
    /* On error, just return the error to the caller. */
    return fresult;
    }
    else if ((*endptr == nptr) || isnan(fresult) ||
...

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Trigger more frequent autovacuums of heavy insert tables

2025-03-01 Thread Nathan Bossart
On Sat, Mar 01, 2025 at 08:57:52AM +0800, wenhui qiu wrote:
> Based on the comments, the  pcnt_unfrozen   value could potentially be 0,
> which would indicate that everything is frozen. Therefore,  is it necessary
> to handle the case where the value is 0.?

How so?  If it's 0, then the insert threshold calculation would produce
autovacuum_vacuum_insert_threshold, just like for an empty table.  That
seems like the intended behavior to me.

-- 
nathan




Re: [Patch] add new parameter to pg_replication_origin_session_setup

2025-03-01 Thread Doruk Yilmaz
I noticed that the patch needs rebasing, so here is the rebased version.
Hopefully it makes to the commitfest.

Doruk Yılmaz
From 74a74fd02bce786093c19a23bef9444d0b8ef41d Mon Sep 17 00:00:00 2001
From: Doruk 
Date: Thu, 15 Aug 2024 23:34:26 +0300
Subject: [PATCH v4] pg_replication_origin_session_setup: pid parameter

Since the introduction of parallel apply workers (commit 216a784829c),
the replorigin_session_setup() was extended to accept an extra
parameter: pid. This process ID is used to inform that multiple
processes are sharing the same replication origin to apply changes in
parallel. The replorigin_session_setup function has a SQL user
interface: pg_replication_origin_session_setup. This commit adds an
optional parameter that passes the process ID to the internal function
replorigin_session_setup. It allows multiple processes to use the same
replication origin if you are using the replication functions.
---
 doc/src/sgml/func.sgml   | 8 +++-
 src/backend/catalog/system_functions.sql | 9 -
 src/backend/replication/logical/origin.c | 4 +++-
 src/include/catalog/pg_proc.dat  | 2 +-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..e50e689fb6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29475,7 +29475,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 
  pg_replication_origin_session_setup
 
-pg_replication_origin_session_setup ( node_name text )
+pg_replication_origin_session_setup ( node_name text , pid integer DEFAULT 0 )
 void


@@ -29483,6 +29483,12 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 origin, allowing replay progress to be tracked.
 Can only be used if no origin is currently selected.
 Use pg_replication_origin_session_reset to undo.
+If multiple processes can safely use the same replication origin (for
+example, parallel apply processes), the optional pid
+parameter can be used to specify the process ID of the first process.
+The first process must provide pid equals to
+0 and the other processes that share the same
+replication origin should provide the process ID of the first process.

   
 
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 86888cd..ebc4005 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -636,6 +636,13 @@ LANGUAGE INTERNAL
 CALLED ON NULL INPUT VOLATILE PARALLEL SAFE
 AS 'pg_stat_reset_slru';
 
+CREATE OR REPLACE FUNCTION
+  pg_replication_origin_session_setup(node_name text, pid integer DEFAULT 0)
+RETURNS void
+LANGUAGE INTERNAL
+STRICT VOLATILE
+AS 'pg_replication_origin_session_setup';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
@@ -737,7 +744,7 @@ REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) FROM
 
 REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_reset() FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text, integer) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_replication_origin_xact_reset() FROM public;
 
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 1b586cb1cf..9cbe1eec45 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1355,12 +1355,14 @@ pg_replication_origin_session_setup(PG_FUNCTION_ARGS)
 {
 	char	   *name;
 	RepOriginId origin;
+	int			pid;
 
 	replorigin_check_prerequisites(true, false);
 
 	name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
 	origin = replorigin_by_name(name, false);
-	replorigin_session_setup(origin, 0);
+	pid = PG_GETARG_INT32(1);
+	replorigin_session_setup(origin, pid);
 
 	replorigin_session_origin = origin;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..ea118a0563 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12063,7 +12063,7 @@
 { oid => '6006',
   descr => 'configure session to maintain replication progress tracking for the passed in origin',
   proname => 'pg_replication_origin_session_setup', provolatile => 'v',
-  proparallel => 'u', prorettype => 'void', proargtypes => 'text',
+  proparallel => 'u', prorettype => 'void', proargtypes => 'text int4',
   prosrc => 'pg_replication_origin_session_setup' },
 
 { oid => '6007', descr => 'teardown configured replication progress tracking',
-- 
2.39.5



Re: making EXPLAIN extensible

2025-03-01 Thread Robert Haas
On Fri, Feb 28, 2025 at 3:09 PM Thom Brown  wrote:
> "pg_overexplain"? I love this name! And the idea sounds like a natural
> evolution, so +1.

Thanks. I thought about things like pg_hyperexplain or
pg_explain_debug, but in the end I didn't like any of them better than
overexplain. :-)

> One thing I am wondering is whether extensions should be required to
> prefix their EXPLAIN option with the extension name to avoid
> collisions.

I considered that. One advantage of doing that is that you could
support autoloading. Right now, you have to LOAD 'pg_overexplain' or
put it in session_preload_libraries or shared_preload_libraries in
order to use it. If you required people to type EXPLAIN
(pg_overexplain.range_table) instead of just EXPLAIN (range_table),
then you could react to not finding any such option by trying to
autoload a .so with the part of the name before the dot.

But you can probably see that this idea has a couple of pretty serious
weaknesses:

1. It is much more verbose. I theorize that people will be unhappy
about having to type EXPLAIN (pg_overexplain.range_table) rather than
just EXPLAIN (range_table). One could try to address this by renaming
the extension to something shorter, like just 'oe'. Having to type
EXPLAIN (oe.range_table) wouldn't be nearly as annoying. However, this
seems like a pretty clear case of letting the tail wag the dog.

2. autoloading could have security concerns. This is probably fixable,
but we'd need to be sure that providing a new way to trigger loading a
module didn't open up any security holes.

> If two extensions happen to choose the same name, it won't be possible
> to use both simultaneously.

That's true. Of course, a lot depends on whether we end up with 3 or 5
or 8 EXPLAIN extensions or more like 30 or 50 or 80. In the former
case, the people writing those extensions will probably mostly know
about each other and can just use different names. In the latter case
it's a problem. My guess is it's the former case.

> In what order would the options be applied? Would it be deterministic,
> or weighted within the extension's configuration, or based on the
> order of the options in the list?

I'm not entirely sure I know which question you're asking here. If
you're asking what happens if two modules try to register the same
EXPLAIN option name and then a user uses it, one of the registrations
will win and the other will lose. I think the second one wins. As I
say above, I assume we'll find a way to not try to do that. However, I
think more likely you're asking: if you load pg_fingernailexplain and
pg_toenailexplain and then do EXPLAIN (toenail, fingernail) SELECT
..., in what order will the options take effect? For the answer to
that question, see the commit message for 0002.

> Would explain extensions be capable of modifying pre-existing core
> option output, or just append to output?

The interfaces we have are really only going to work for appending.
Modifying would be cool, but I think it's mostly impractical. We have
a framework for emitting stuff into EXPLAIN output in a way that takes
into account whether you're in text mode or json or yaml or whatever,
and this patch just builds on that existing framework to allow you to
make extra calls to those emit-some-output functions at useful places.
As a result, the patch is small and simple. If we had an existing
framework for modifying stuff, then we could perhaps provide suitable
places to call those functions, too. But they don't exist, and it's
not easy to see how they could be created. I think you would need some
kind of major redesign of explain.c, and I don't know how to do that
without making it bloated, slow, and unmaintainable.

If somebody comes up with a way of allowing certain limited types of
modifications to EXPLAIN output with small, elegant-looking code
changes, and if those changes seem like useful things for an extension
to want to do, I'm totally on board. But I currently don't have an
idea like that.

> Should there be a way of determining which lines are output by which
> option? An extension may output similar output to core output, making
> it difficult or impossible to discern which is which.

I don't think this is really going to be a problem.

> Does there need to be any security considerations so that things like
> RLS don't inadvertently become leaky?

It's possible that there may be some security considerations, and
that's worth thinking about. However, RLS disclaims support for
side-channel attacks, so it's already understood to be (very) leaky.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Licence preamble update

2025-03-01 Thread Amit Kapila
On Sat, Mar 1, 2025 at 9:20 AM Noah Misch  wrote:
>
> On Fri, Feb 28, 2025 at 12:07:26PM -0500, Tom Lane wrote:
> >
> > PGCAC holds trademarks on both "PostgreSQL" and "Postgres".  We've
> > been given legal advice that both terms need to be in current use
> > to preserve the trademarks.  Which they are and have been, but the
> > present wording in COPYRIGHT doesn't align with that.  The website
> > phrasing will be adjusted as well.
>
> I'm good with the change, then.
>

LGTM as well.

-- 
With Regards,
Amit Kapila.




Re: Adding NetBSD and OpenBSD to Postgres CI

2025-03-01 Thread Thomas Munro
On Sat, Mar 1, 2025 at 6:24 AM Nazir Bilal Yavuz  wrote:
> I think I found the problem, sd0j's fstype is not a swap. It worked like that:
>
> $ disklabel -E sd0
> $ umount /usr/obj
> $ disklabel -E sd0 # prompts are: m -> j -> \n -> \n -> swap -> w -> q
> $ disklabel -E sd0
> $ swapon /dev/sd0j # runs successfully

Thanks!  I just piped those characters in and it worked, shaving
another couple of minutes off.  The script is probably now about as
short and sweet as it needs to be?  And the times are pretty good.

https://cirrus-ci.com/build/5275349266726912
From 68232b16632fcc9469017ec219308c798e0d5391 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 22:19:07 +1300
Subject: [PATCH v2 1/2] ci: Use a RAM disk for NetBSD and OpenBSD.

Put the RAM disk setup for all three *BSD CI tasks into a common script,
replacing the old FreeBSD-specific one from commit 0265e5c1.  This makes
them run 3 times and a bit over 2 times faster, respectively.

NetBSD and FreeBSD can use the same one-liner to mount tmpfs, but
OpenBSD needs a GCP-image specific recipe that knows where to steal a
disk partition to reserve swap space to mount mfs, because its tmpfs
is deprecated and currently broken.  The configured size is enough for
our current tests but could potentially need future tweaks.  Thanks to
Bilal for the disklabel incantation.

Reviewed-by: Nazir Bilal Yavuz 
Discussion: https://postgr.es/m/CA%2BhUKGJJ-XrPhN%2BQA4ZUfYAAXcwOSDty9t0vE9Z8__AdacKnQg%40mail.gmail.com
---
 .cirrus.tasks.yml   |  5 ++---
 src/tools/ci/gcp_freebsd_repartition.sh | 26 -
 src/tools/ci/gcp_ram_disk.sh| 22 +
 3 files changed, 24 insertions(+), 29 deletions(-)
 delete mode 100755 src/tools/ci/gcp_freebsd_repartition.sh
 create mode 100755 src/tools/ci/gcp_ram_disk.sh

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 91b51142d2e..c5e7b743bfb 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -155,8 +155,7 @@ task:
 
   ccache_cache:
 folder: $CCACHE_DIR
-  # Work around performance issues due to 32KB block size
-  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
+  setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh
   create_user_script: |
 pw useradd postgres
 chown -R postgres:postgres .
@@ -276,7 +275,7 @@ task:
 
   ccache_cache:
 folder: $CCACHE_DIR
-
+  setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh
   create_user_script: |
 useradd postgres
 chown -R postgres:users /home/postgres
diff --git a/src/tools/ci/gcp_freebsd_repartition.sh b/src/tools/ci/gcp_freebsd_repartition.sh
deleted file mode 100755
index 3adb8fb88ec..000
--- a/src/tools/ci/gcp_freebsd_repartition.sh
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/bin/sh
-
-set -e
-set -x
-
-# fix backup partition table after resize
-gpart recover da0
-gpart show da0
-
-# delete and re-add swap partition with expanded size
-swapoff -a
-gpart delete -i 3 da0
-gpart add -t freebsd-swap -l swapfs -a 4096 da0
-gpart show da0
-swapon -a
-
-# create a file system on a memory disk backed by swap, to minimize I/O
-mdconfig -a -t swap -s20g -u md1
-newfs -b 8192 -U /dev/md1
-
-# migrate working directory
-du -hs $CIRRUS_WORKING_DIR
-mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig
-mkdir $CIRRUS_WORKING_DIR
-mount -o noatime /dev/md1 $CIRRUS_WORKING_DIR
-cp -a $CIRRUS_WORKING_DIR.orig/ $CIRRUS_WORKING_DIR/
diff --git a/src/tools/ci/gcp_ram_disk.sh b/src/tools/ci/gcp_ram_disk.sh
new file mode 100755
index 000..d48634512ac
--- /dev/null
+++ b/src/tools/ci/gcp_ram_disk.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+# Move working directory into a RAM disk for better performance.
+
+set -e
+set -x
+
+mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig
+mkdir $CIRRUS_WORKING_DIR
+
+case "`uname`" in
+  FreeBSD|NetBSD)
+mount -t tmpfs tmpfs $CIRRUS_WORKING_DIR
+;;
+  OpenBSD)
+umount /dev/sd0j # unused /usr/obj partition
+printf "m j\n\n\nswap\nw\nq\n" | disklabel -E sd0
+swapon /dev/sd0j
+mount -t mfs -o rw,noatime,nodev,-s=800 swap $CIRRUS_WORKING_DIR
+;;
+esac
+
+cp -a $CIRRUS_WORKING_DIR.orig/. $CIRRUS_WORKING_DIR/
-- 
2.39.5

From cf2f8301f30537d7448190f9a2d0889845b4d168 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 2 Mar 2025 01:08:02 +1300
Subject: [PATCH v2 2/2] ci: Enable NetBSD and OpenBSD by default.

XXX Not saying we're quite ready to do this (are we?), but cfbot will
test them for this CF entry
---
 .cirrus.tasks.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index c5e7b743bfb..2004610989d 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -218,7 +218,6 @@ task:
 
 task:
   depends_on: SanityCheck
-  trigger_type: manual
 
   env:
 # Below are experimentally derived to be a decent choice.
-- 
2.39.5



Re: bug when apply fast default mechanism for adding new column over domain with default value

2025-03-01 Thread jian he
On Sat, Mar 1, 2025 at 2:43 PM Tom Lane  wrote:
>
> I do not believe that case should require a table rewrite.
> Both the default and the check constraint are immutable,
> so we ought to be able to apply the check once and then
> use the default as the attmissingval.
>
> > Attach a patch to fix this issue by cause it to table rewrite.
>
> I see no attached patch, but in any case forcing a table rewrite
> seems like the wrong direction...
>

forcing table rewrite would be an easier fix.
but not forcing table write seems doable.

 \d pg_attrdef
  Table "pg_catalog.pg_attrdef"
 Column  | Type | Collation | Nullable | Default
-+--+---+--+-
 oid | oid  |   | not null |
 adrelid | oid  |   | not null |
 adnum   | smallint |   | not null |
 adbin   | pg_node_tree | C | not null |
Indexes:
"pg_attrdef_oid_index" PRIMARY KEY, btree (oid)
"pg_attrdef_adrelid_adnum_index" UNIQUE CONSTRAINT, btree (adrelid, adnum)

pg_attrdef_adrelid_adnum_index means
a column can only have one default expression adbin.
if we store domain's default expression in pg_attrdef it may have error like:

CREATE DOMAIN int_domain AS int DEFAULT 11;
ALTER TABLE t2 ADD COLUMN b int_domain default 3;
ERROR:  duplicate key value violates unique constraint
"pg_attrdef_adrelid_adnum_index"
DETAIL:  Key (adrelid, adnum)=(18102, 2) already exists.

currently function StoreAttrDefault will
1. set pg_attribute.atthasdef
2. compute and set atthasmissing, attmissingval
3. insert an entry in pg_attrdef.
but we only want 2.

So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef,
pg_attrdef related code.
and it works fine.
From 4f87b744b19f34b02817d252cc69666e33da2e18 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 1 Mar 2025 22:01:21 +0800
Subject: [PATCH v1 1/1] apply fast default for adding new column over domain
 with default value

discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kqgl+tnvox5lo...@mail.gmail.com
---
 src/backend/catalog/pg_attrdef.c   | 108 +
 src/backend/commands/tablecmds.c   |  26 -
 src/include/catalog/pg_attrdef.h   |   3 +
 src/test/regress/expected/fast_default.out |  49 ++
 src/test/regress/sql/fast_default.sql  |  36 +++
 5 files changed, 220 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/pg_attrdef.c b/src/backend/catalog/pg_attrdef.c
index 91dafc81021..3b47139f94f 100644
--- a/src/backend/catalog/pg_attrdef.c
+++ b/src/backend/catalog/pg_attrdef.c
@@ -31,6 +31,114 @@
 #include "utils/syscache.h"
 
 
+
+/*
+ * XXX is this the righ place?
+ * Store a attmissingval datum value for column attnum of relation rel.
+ *
+ * This closely resembles StoreAttrDefault, like:
+ * Sets atthasmissing to true and adds attmissingval for the specified attnum.
+ * but with two differences:
+ * - Does not set pg_attribute.atthasdef to true.
+ * - Does not store the default expression in pg_attrdef.
+ */
+void
+StoreAttrMissingVal(Relation rel, AttrNumber attnum,
+	Node *expr, bool is_internal, bool add_column_mode)
+{
+	Relation	attrrel;
+	HeapTuple	atttup;
+	Form_pg_attribute attStruct;
+	Form_pg_attribute defAttStruct;
+
+	ExprState  *exprState;
+	Expr	   *expr2 = (Expr *) expr;
+	EState	   *estate = NULL;
+	ExprContext *econtext;
+	Datum		valuesAtt[Natts_pg_attribute] = {0};
+	bool		nullsAtt[Natts_pg_attribute] = {0};
+	bool		replacesAtt[Natts_pg_attribute] = {0};
+	Datum		missingval = (Datum) 0;
+	bool		missingIsNull = true;
+
+	/*
+	 * Update the pg_attribute entry for the column to show that we store attmissingval
+	 * on it.
+	 */
+	attrrel = table_open(AttributeRelationId, RowExclusiveLock);
+	atttup = SearchSysCacheCopy2(ATTNUM,
+ ObjectIdGetDatum(RelationGetRelid(rel)),
+ Int16GetDatum(attnum));
+	if (!HeapTupleIsValid(atttup))
+		elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+			 attnum, RelationGetRelid(rel));
+	attStruct = (Form_pg_attribute) GETSTRUCT(atttup);
+
+	/*
+	 * since we *only* restore default expression evaluated value in pg_attribute.attmissingval
+	 * for attribute attnum, not store default expression entry on pg_attrdef
+	 * we can't set pg_attribute.atthasdef (Anum_pg_attribute_atthasdef) to true.  That's the
+	 * main difference with StoreAttrDefault.
+	*/
+	if (!attStruct->atthasdef && attStruct->attgenerated == '\0' &&
+		rel->rd_rel->relkind == RELKIND_RELATION && add_column_mode)
+	{
+		expr2 = expression_planner(expr2);
+		estate = CreateExecutorState();
+		exprState = ExecPrepareExpr(expr2, estate);
+		econtext = GetPerTupleExprContext(estate);
+
+		missingval = ExecEvalExpr(exprState, econtext,
+	&missingIsNull);
+
+		FreeExecutorState(estate);
+
+		defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1);
+
+		if (missingIsNull)
+			missingval = (Datum) 0;
+		else
+		{
+			/* make a one-element array of the value */
+			miss

Re: Statistics Import and Export

2025-03-01 Thread Alexander Lakhin

Hello Tom,

01.03.2025 20:04, Tom Lane wrote:

Alexander Lakhin  writes:

It looks like 8f427187d broke pg_dump on Cygwin:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07

Yeah, Andrew and I have been puzzling over that off-list.  pg_dump
is definitely exiting unceremoniously.


As far as I can see, it exits prematurely here:
     float   reltuples = strtof(PQgetvalue(res, i, 
i_reltuples), NULL);

I was suspecting those float conversions as a likely cause, but
what do you think is wrong exactly?  I see nothing obviously
buggy in pg_strtof().


From my understanding, pg_strtof () can't stand against endptr == NULL.
I have changed that line to:
    char *tptr;
    float    reltuples = strtof(PQgetvalue(res, i, i_reltuples), &tptr);

and 002_compare_backups passed for me.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)




Re: Statistics Import and Export

2025-03-01 Thread Tom Lane
Alexander Lakhin  writes:
> It looks like 8f427187d broke pg_dump on Cygwin:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-02-26%2010%3A03%3A07

Yeah, Andrew and I have been puzzling over that off-list.  pg_dump
is definitely exiting unceremoniously.

> As far as I can see, it exits prematurely here:
>     float   reltuples = strtof(PQgetvalue(res, i, 
> i_reltuples), NULL);

I was suspecting those float conversions as a likely cause, but
what do you think is wrong exactly?  I see nothing obviously
buggy in pg_strtof().

But I'm not sure it's worth running to ground.  I don't love any of
the portability-related hacks that 8f427187d made: the setlocale()
call looks like something with an undesirably large blast radius,
and pg_dump has never made use of strtof or f2s.c before.  Sure,
those *ought* to work, but they evidently don't work everywhere,
and I don't especially want to expend more brain cells figuring out
what's wrong here.  I think we ought to cut our losses and store
reltuples in string form, as Corey wanted to do originally.

regards, tom lane




Re: Statistics Import and Export

2025-03-01 Thread Greg Sabino Mullane
> Can you expand on some of those cases?

Certainly. I think one of the problems is that because this patch is
solving a pg_upgrade issue, the focus is on the "dump and restore"
scenarios. But pg_dump is used for much more than that, especially "dump
and examine".

Although pg_dump is meant to be a canonical, logical representation of your
schema and data, the stats add a non-determinant element to that.
Statistical sampling is random, so pg_dump output changes with each run.
(yes, COPY can also change, but much less so, as I argue later).

One use case is a program that is simply using pg_dump to verify that
nothing has modified your table data (I'll use a single table for these
examples, but obviously this applies to a whole database as well). So let's
say we create a table and populate it at time X, then check back at a later
time to verify things are still exactly as we left them.

dropdb gregtest
createdb gregtest
pgbench gregtest -i 2> /dev/null
pg_dump gregtest -t pgbench_accounts > a1
sleep 10
pg_dump gregtest -t pgbench_accounts > a2
diff a1 a2 | cut -c1-50

100078c100078
<   'histogram_bounds', '{2,964,1921,2917,3892,4935
---
>   'histogram_bounds', '{7,989,1990,2969,3973,4977

While COPY is not going to promise a particular output order, the order
should not change except for manual things: insert, update, delete,
truncate, vacuum full, cluster (off the top of my head). What should not
change the output is a background process gathering some metadata. Or
someone running a database-wide ANALYZE.


Another use case is someone rolling out their schema to a QA box. All the
table definitions and data are checked into a git repository, with a
checksum. They want to roll it out, and then verify that everything is
exactly as they expect it to be. Or the program is part of a test suite
that does a sanity check that the database is in an exact known state
before starting.

(Our system catalogs are very difficult when reverse engineering objects.
Thus, many programs rely on pg_dump to do the heavy lifting for them.
Parsing the text file generated by pg_dump is much easier than trying to
manipulate the system catalogs.)

So let's say the process is to create a new database, load things into it,
and then checksum the result. We can simulate that with pg_bench:

dropdb qa1; dropdb qa2
createdb qa1; createdb qa2
pgbench qa1 -i 2>/dev/null
pgbench qa2 -i 2>/dev/null
pg_dump qa1 > dump1; pg_dump qa2 > dump2

$ md5sum dump1
39a2da5e51e8541e9a2c025c918bf463  dump1

This md5sum does not match our repo! It doesn't even match the other one:

$ md5sum dump2
4a977657dfdf910cb66c875d29cfebf2  dump2

It's the stats, or course, which has added a dose of randomness that was
not there before, and makes our checksums useless:

$ diff dump1 dump2 | cut -c1-50
100172c100172
<   'histogram_bounds', '{1,979,1974,2952,3973,4900
---
>   'histogram_bounds', '{8,1017,2054,3034,4045,513

With --no-statistics, the diff shows no difference, and the md5sum is
always the same.

Just to be clear, I love this patch, and I love the fact that one of our
major upgrade warts is finally getting fixed. I've tried fixing it myself a
few times over the last decade or so, but lacked the skills to do so. :) So
I am thrilled to have this finally done. I just don't think it should be
enabled by default for everything using pg_dump. For the record, I would
not strongly object to having stats on by default for binary dumps,
although I would prefer them off.

So why not just expect people to modify their programs to use
--no-statistics for cases like this? That's certainly an option, but it's
going to break a lot of existing things, and create branching code:

old code:
pg_dump mydb -f pg.dump

new code:
if pg_dump.version >= 18
  pg_dump --no-statistics mydb -f pg.dump
else
  pg_dump mydb -f pg.dump

Also, anything trained to parse pg_dump output will have to learn about the
new SELECT pg_restore_ calls with their multi-line formats (not 100% sure
we don't have that anywhere, as things like "SELECT setval" and "SELECT
set_config" are single line, but there may be existing things)


Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: Statistics Import and Export

2025-03-01 Thread Tom Lane
Alexander Lakhin  writes:
> 01.03.2025 20:04, Tom Lane wrote:
>> I was suspecting those float conversions as a likely cause, but
>> what do you think is wrong exactly?  I see nothing obviously
>> buggy in pg_strtof().

>  From my understanding, pg_strtof () can't stand against endptr == NULL.

D'oh!  I'm blind as a bat today.

> I have changed that line to:
>      char *tptr;
>      float    reltuples = strtof(PQgetvalue(res, i, i_reltuples), 
> &tptr);
> and 002_compare_backups passed for me.

Cool, but surely the right fix is to make pg_strtof() adhere to
the POSIX specification, so we don't have to learn this lesson
again elsewhere.  I'll go make it so.

Independently of that, do we want to switch over to storing
reltuples as a string instead of converting it?  I still feel
uncomfortable about the amount of baggage we added to pg_dump
to avoid that.

regards, tom lane




Re: Statistics Import and Export

2025-03-01 Thread Jeff Davis
On Sat, 2025-03-01 at 13:52 -0500, Greg Sabino Mullane wrote:
> > Can you expand on some of those cases?
> 
> Certainly. I think one of the problems is that because this patch is
> solving a pg_upgrade issue, the focus is on the "dump and restore"
> scenarios. But pg_dump is used for much more than that, especially
> "dump and examine".

Thank you for going through these examples.

> I just don't think it should be enabled by default for everything
> using pg_dump. For the record, I would not strongly object to having
> stats on by default for binary dumps, although I would prefer them
> off.

I am open to that idea, I just want to get it right, because probably
whatever the default is in 18 will stay that way.

Also, we will need to think through the set of pg_dump options again. A
lot of our tools seem to assume that "if it's the default, we don't
need a way to ask for it explicitly", which makes it a lot harder to
ever change the default and keep a coherent set of options.

> So why not just expect people to modify their programs to use --no-
> statistics for cases like this? That's certainly an option, but it's
> going to break a lot of existing things, and create branching code:

I suggest that we wait a bit to see what additional feedback we get
early in beta.

> Also, anything trained to parse pg_dump output will have to learn
> about the new SELECT pg_restore_ calls with their multi-line formats
> (not 100% sure we don't have that anywhere, as things like "SELECT
> setval" and "SELECT set_config" are single line, but there may be
> existing things)

That's an interesting point. What tools are currrently trying to parse
pg_dump output?

Regards,
Jeff Davis





Re: access numeric data in module

2025-03-01 Thread Ed Behn
Tom-
I think that I can allay your concerns. Please give me a day or so to
put together my more complete thoughts on the matter. I'll be in touch.

-Ed

On Sat, Mar 1, 2025 at 11:33 AM Tom Lane  wrote:

> Ed Behn  writes:
> >> There is actually no new code. Code is simply moved from numeric.c to
> >> numeric.h.
>
> I will absolutely not hold still for that.  It would mean that any
> time we want to think about messing with the contents of numerics,
> we need to examine more or less the whole Postgres code base to see
> what else is poking into those structures.
>
> If we must do something like this, then a separate header
> "numeric_internal.h" or something like that would reduce the blast
> radius for changes.  But IMO you still haven't made an acceptable case
> for deciding that these data structures aren't private to numeric.c.
> What behaviors do you actually need that aren't accessible via the
> existing exported functons?
>
> regards, tom lane
>


Re: Statistics Import and Export

2025-03-01 Thread Jeff Davis
On Fri, 2025-02-28 at 14:56 -0600, Nathan Bossart wrote:
> On Fri, Feb 28, 2025 at 12:54:03PM -0800, Jeff Davis wrote:
> > (Aside: I assume everyone here agrees that pg_upgrade should
> > transfer
> > the stats by default.)
> 
> That feels like a safe assumption to me...  I'm curious what the use-
> case
> is for pg_upgrade's --no-statistics option.

Mostly completeness and paranoia. I don't see a real use case. If we
decide we don't need it, that's fine with me.

Regards,
Jeff Davis