Re: Make pg_stat_io view count IOs as bytes instead of blocks

2024-12-26 Thread Nazir Bilal Yavuz
Hi,

On Tue, 24 Dec 2024 at 09:13, Michael Paquier  wrote:
>
> On Fri, Dec 06, 2024 at 12:41:55PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks! I think 'track' is a better word in this context. I used
> > 'tracked in ...', as it sounded more correct to me (I hope it is).
>
> Splitting op_bytes into three fields sounds like a good idea.  Count
> me in regarding the concept to depend less on BLCKSZ.
>
>  typedef enum IOOp
>  {
> +   /* IOs not tracked in bytes */
> IOOP_EVICT,
> -   IOOP_EXTEND,
> IOOP_FSYNC,
> IOOP_HIT,
> -   IOOP_READ,
> IOOP_REUSE,
> -   IOOP_WRITE,
> IOOP_WRITEBACK,
> +
> +   /* IOs tracked in bytes */
> +   IOOP_EXTEND,
> +   IOOP_READ,
> +   IOOP_WRITE,
>  } IOOp;
>
> pg_stat_io_build_tuples() is now the code path taken when building the
> tuples returned by pg_stat_io, meaning that the new function called
> pg_stat_get_backend_io() is also going to need an update in its list
> of output parameters to accomodate with what you are changing here.
> Calling your new pgstat_get_io_byte_index() in the new refactored
> routine is also necessary.  Sorry about that.

No problem at all, thank you for reminding me!

> Could you send a rebase, please?  I can promise to review this
> thread's new patch, as that will also move the needle regarding your
> work with pg_stat_io to track WAL activity.

Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io()
function and it seems it is working.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From b5c8b86aebd1149b30bc1715921edef161f99ba7 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 26 Dec 2024 12:05:13 +0300
Subject: [PATCH v4] Make pg_stat_io count IOs as bytes instead of blocks

Currently in pg_stat_io view, IOs are counted as blocks. There are two
problems with this approach:

1- The actual number of I/O requests sent to the kernel is lower because
I/O requests may be merged before being sent. Additionally, it gives the
impression that all I/Os are done in block size, which shadows the
benefits of merging I/O requests.

2- There may be some IOs which are not done in block size in the future.
For example, WAL read IOs are done in variable bytes and it is not
possible to correctly show these IOs in pg_stat_io view.

Because of these problems, now show the total number of IO requests to
the kernel (as smgr function calls) and total number of bytes in the IO.
Also, op_bytes column is removed from the pg_stat_io view.
---
 src/include/catalog/pg_proc.dat | 12 +--
 src/include/pgstat.h| 39 --
 src/backend/catalog/system_views.sql|  4 +-
 src/backend/storage/buffer/bufmgr.c | 14 ++--
 src/backend/storage/buffer/localbuf.c   |  6 +-
 src/backend/storage/smgr/md.c   |  4 +-
 src/backend/utils/activity/pgstat_backend.c |  2 +
 src/backend/utils/activity/pgstat_io.c  | 16 ++--
 src/backend/utils/adt/pgstatfuncs.c | 86 -
 src/test/regress/expected/rules.out |  6 +-
 doc/src/sgml/monitoring.sgml| 61 +--
 11 files changed, 172 insertions(+), 78 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2dcc2d42dac..00051d41e0f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5908,18 +5908,18 @@
   proname => 'pg_stat_get_io', prorows => '30', proretset => 't',
   provolatile => 'v', proparallel => 'r', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  proallargtypes => '{text,text,text,int8,numeric,float8,int8,numeric,float8,int8,float8,int8,numeric,float8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,object,context,reads,read_bytes,read_time,writes,write_bytes,write_time,writebacks,writeback_time,extends,extend_bytes,extend_time,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
 { oid => '8806', descr => 'statistics: backend IO statistics',
   proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
   provolatile => 'v', proparallel => 'r', prorettype => 'record',
   proargtypes => 'int4',
-  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hit

Removing unused parameter in compute_expr_stats

2024-12-26 Thread Ilia Evdokimov

Hi hackers,

I'm sharing a small patch that removes an unused parameter (totalrows) 
from compute_expr_stats function in extended_stats.c .


This function originally appeared in commit 
a4d75c86bf15220df22de0a92c819ecef9db3849, which introduced extended 
statistics on expressions. From what I can see, total rows is never used 
in the current implementation, so this patch removes it


If there are reasons to keep this parameter, or if there's anything I 
might have missed, I'd be happy to discuss further.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

From 0de17516bec761474761bf2862724b23ccb11beb Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 26 Dec 2024 12:09:14 +0300
Subject: [PATCH v1] Remove unused totalrows parameter in compute_expr_stats

The totalrows parameter in compute_expr_stats is never referenced anyway
in the function or by its callers. It appears to have been introduced in
commit a4d75c86bf15220df22de0a92c819ecef9db3849, but it remained unused
---
 src/backend/statistics/extended_stats.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..85da3edb10 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -89,9 +89,8 @@ typedef struct AnlExprData
 	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
 } AnlExprData;
 
-static void compute_expr_stats(Relation onerel, double totalrows,
-			   AnlExprData *exprdata, int nexprs,
-			   HeapTuple *rows, int numrows);
+static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
+			   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -220,8 +219,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 exprdata = build_expr_data(stat->exprs, stattarget);
 nexprs = list_length(stat->exprs);
 
-compute_expr_stats(onerel, totalrows,
-   exprdata, nexprs,
+compute_expr_stats(onerel, exprdata, nexprs,
    rows, numrows);
 
 exprstats = serialize_expr_stats(exprdata, nexprs);
@@ -2107,9 +2105,8 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
  * Compute statistics about expressions of a relation.
  */
 static void
-compute_expr_stats(Relation onerel, double totalrows,
-   AnlExprData *exprdata, int nexprs,
-   HeapTuple *rows, int numrows)
+compute_expr_stats(Relation onerel, AnlExprData *exprdata,
+   int nexprs, HeapTuple *rows, int numrows)
 {
 	MemoryContext expr_context,
 old_context;
-- 
2.34.1



Re: RFC: Allow EXPLAIN to Output Page Fault Information

2024-12-26 Thread Jeremy Schneider
On Thu, 26 Dec 2024 13:15:59 +0900
torikoshia  wrote:

> On 2024-12-25 00:52, Tom Lane wrote:
> > torikoshia  writes:  
> >> I have attached a PoC patch that modifies EXPLAIN to include page 
> >> fault
> >> information during both the planning and execution phases of a
> >> query.  
> > 
> > Surely these numbers would be too unstable to be worth anything.  
> 
> Thanks for your comment!
> 
> Hmm, I didn't know these are unstable. If there are any reasons, I'd 
> like to know about them.
> 
> I would like to explore alternative methods for measuring resource 
> usage, but
> I am not aware of other approaches.
> (IIUC pg_stat_kcache[1], which is said to provide information about 
> filesystem layer I/O usage also gets metrics from getrusage(2))

What I'd really like to see is a column added to pg_stat_database
called blks_read_majflts

It would be great if we could calculate a cache hit ratio that took OS
major page faults into account

Yes this could also be done in pg_stat_kcache but why not do it right
in pg_stat_database? I think it would pretty widely appreciated and
used.

-Jeremy





Re: Fix crash when non-creator being an iteration on shared radix tree

2024-12-26 Thread Masahiko Sawada
On Fri, Dec 20, 2024 at 9:55 PM John Naylor  wrote:
>
> On Sat, Dec 21, 2024 at 2:17 AM Masahiko Sawada  wrote:
> >
> > On Fri, Dec 20, 2024 at 2:27 AM John Naylor  wrote:
>
> > > v3-0001 allocates the iter data in the caller's context. It's a small
> > > patch, but still a core behavior change so proposed for master-only. I
> > > believe your v1 is still the least invasive fix for PG17.
> >
> > I agree to use v1 for v17.
>
> Okay, did you want to commit that separately, or together with my 0001
> on master? Either way, I've put a bit more effort into the commit
> message in v4 and adjusted the comment slightly.

I think we can commit them separately. It's a pure bug fix for back
branches while for HEAD it removes the bug by an improvement.

>
> > > It's trivial after 0001-02: 0003 removes makes one test use slab as
> > > the passed context (only 32-bit systems would actually use it
> > > currently).
> >
> > These changes make sense to me. Here are a few comments:
> >
> > RT_RADIX_TREE has 'leaf_context' but it seems that we use it only for
> > local memory. Do we want to declare only in the !RT_SHMEM case?
>
> That's already the case, if I understand your statement correctly.

True. Sorry, I misunderstood something.

>
> > ---
> > /*
> >  * Similar to TidStoreCreateLocal() but create a shared TidStore on a
> >  * DSA area. The TID storage will live in the DSA area, and the memory
> >  * context rt_context will have only meta data of the radix tree.
> >  *
> >  * The returned object is allocated in backend-local memory.
> >  */
> >
> > We need to update the comment about rt_context too since we no longer
> > use rt_context for shared tidstore.
>
> Fixed.

The three patches look good to me.

> BTW, it seems TidStore.context is unused?

Indeed. We can remove it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Logging parallel worker draught

2024-12-26 Thread Benoit Lobréau

This is just a rebase.

As stated before, I added some information to the error message for 
parallel queries as an experiment. It can be removed, if you re not 
convinced.


---
Benoit Lobréau
Consultant
http://dalibo.comFrom 7e63f79226357f2fe84acedb950e64383e582b17 Mon Sep 17 00:00:00 2001
From: benoit 
Date: Tue, 8 Oct 2024 12:39:41 +0200
Subject: [PATCH 1/5] Add a guc for parallel worker logging

The new guc log_parallel_workers controls whether a log message is
produced to display information on the number of workers spawned when a
parallel query or utility is executed.

The default value is `none` which disables logging. `all` displays
information for all parallel queries, whereas `failures` displays
information only when the number of workers launched is lower than the
number of planned workers.

This new parameter can help database administrators and developers
diagnose performance issues related to parallelism and optimize the
configuration of the system accordingly.
---
 doc/src/sgml/config.sgml  | 18 ++
 src/backend/access/transam/parallel.c | 15 +++
 src/backend/utils/misc/guc_tables.c   | 12 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/parallel.h | 19 +++
 5 files changed, 65 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fbdd6ce574..3a5f7131b3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7752,6 +7752,24 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+
+  log_parallel_workers (enum)
+  
+   log_parallel_workers configuration parameter
+  
+  
+  
+   
+Controls whether a log message about the number of workers is produced during the
+execution of a parallel query or utility statement. The default value is
+none which disables logging. all displays
+information for all parallel queries, whereas failures displays
+information only when the number of workers launched is lower than the number of
+planned workers.
+   
+  
+ 
+
  
   log_parameter_max_length (integer)
   
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 0a1e089ec1..e1d378efa6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1653,3 +1653,18 @@ LookupParallelWorkerFunction(const char *libraryname, const char *funcname)
 	return (parallel_worker_main_type)
 		load_external_function(libraryname, funcname, true, NULL);
 }
+
+/*
+ * The function determines if information about workers should
+ * be logged.
+*/
+bool
+LoggingParallelWorkers(int log_parallel_workers,
+	   int parallel_workers_to_launch,
+	   int parallel_workers_launched)
+{
+	return ((log_parallel_workers == LOG_PARALLEL_WORKERS_ALL &&
+			parallel_workers_to_launch > 0) ||
+			(log_parallel_workers == LOG_PARALLEL_WORKERS_FAILURE &&
+			parallel_workers_to_launch != parallel_workers_launched));
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8cf1afbad2..cc4d70177d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -482,6 +482,7 @@ extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
 extern const struct config_enum_entry wal_sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
+extern const struct config_enum_entry log_parallel_workers_options[];
 
 /*
  * GUC option variables that are exported from this module
@@ -526,6 +527,7 @@ int			log_min_duration_statement = -1;
 int			log_parameter_max_length = -1;
 int			log_parameter_max_length_on_error = 0;
 int			log_temp_files = -1;
+int			log_parallel_workers = LOG_PARALLEL_WORKERS_NONE;
 double		log_statement_sample_rate = 1.0;
 double		log_xact_sample_rate = 0;
 char	   *backtrace_functions;
@@ -5226,6 +5228,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_parallel_workers", PGC_SUSET, LOGGING_WHAT,
+			gettext_noop("Log information about parallel worker usage"),
+			NULL
+		},
+		&log_parallel_workers,
+		LOG_PARALLEL_WORKERS_NONE, log_parallel_workers_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a2ac7575ca..f11bf173ff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -622,6 +622,7 @@
 #log_temp_files = -1			# log temporary files equal or larger
 	# than the specified size in kilobytes;
 	# -1 disables, 0 logs all temp files
+#log_parallel_workers = none# none, all, failure
 #

RE: Psql meta-command conninfo+

2024-12-26 Thread Maiquel Grassi
>Hi,
>
>I provided a patch a month ago with a new approach as suggested by David. 
>Unfortunately, it didn't get any attention in the November CF.
>
>I would appreciate your feedback on whether we should proceed with this new 
>approach or stick with the previous one.
>
>Regards,
>Hunaid Sohail
--//--
Hi,
The previous version was good to go and ready for a commit as soon as the final 
review is finalized. About David’s proposal, I’m still a little unsure, but it 
seems like it has a lot of potential. What do you all think? If it’s a strong 
direction to explore, maybe David could join us as a co-author and work with us 
to bring the feature.
Regards,
Maiquel.



Re: Memory leak in pg_logical_slot_{get,peek}_changes

2024-12-26 Thread vignesh C
On Thu, 12 Dec 2024 at 20:32, vignesh C  wrote:
>
> On Wed, 11 Dec 2024 at 11:39, Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, December 11, 2024 12:28 PM vignesh C  
> > wrote:
> > > Hi,
> > >
> > > I'm starting a new thread for one of the issues reported by Sawada-san at 
> > > [1].
> > >
> > > This is a memory leak on CacheMemoryContext when using pgoutput via SQL
> > > APIs:
> > > /* Map must live as long as the session does. */
> > > oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > >
> > > entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);
> > >
> > > MemoryContextSwitchTo(oldctx);
> > > RelationClose(ancestor);
> > >
> > > entry->attrmap is pfree'd only when validating the RelationSyncEntry
> > > so remains even after logical decoding API calls.
> > >
> > > This issue can be reproduced with the following test:
> > > -- Create table
> > > create table t1( c1 int, c2 int, c3 int) PARTITION BY LIST (c1);
> > > create table t1_1( c2 int, c1 int, c3 int);
> > > ALTER TABLE t1 ATTACH PARTITION t1_1 FOR VALUES IN (0, 1, 2, 3);
> > >
> > > -- Create publication
> > > create publication pub1 FOR TABLE t1, t1_1 WITH
> > > (publish_via_partition_root = true);
> > >
> > > -- Create replication slot
> > > SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput');
> > >
> > > -- Run the below steps between Test start and Test end many times to
> > > see the memory usage getting increased
> > > -- Test start
> > > insert into t1 values( 1);
> > > SELECT count(*) FROM pg_logical_slot_get_binary_changes('test', NULL,
> > > NULL, 'proto_version', '4', 'publication_names', 'pub1');
> > >
> > > -- Memory increases after each invalidation and insert
> > > SELECT * FROM pg_backend_memory_contexts where name =
> > > 'CacheMemoryContext';
> > > -- Test end
> > >
> > > The attached patch resolves a memory leak by ensuring that the
> > > attribute map is properly freed during plugin shutdown. This process
> > > is triggered by the SQL API when the decoding context is being
> > > released. Additionally, I am conducting a review to identify and
> > > address any similar memory leaks that may exist elsewhere in the code.
> >
> > Thanks for reporting the issue and share the fix.
> >
> > I am not sure if freeing them in shutdown callback is safe, because shutdown
> > callback Is not invoked in case of ERRORs. I think we'd better allocate them
> > under cachectx in the beginning to avoid freeing them explicitly.
> >
> > The attachment is a POC that I internally experimented before. It replaces 
> > not
> > only the attrmap's context, but also others which were allocated under
> > CacheMemoryContext, to be consistent. Note that I have not tested it deeply.
> > Feel free to use if it works for you.
> >
> > But the patch can only be used since PG15 where cachectx is introduced. In
> > older braches, we may need to think some other ways.
>
> Since we cannot add a new member to PGOutputData, how about creating a
> new global static memory context specifically for storing the relation
> entry attribute map? This memory context could be allocated and reset
> during pgoutput_startup. This way, if a SQL call is made after an
> error, the context will be reset, preventing memory bloat as addressed
> in the attached patch. Thoughts?

Here is an updated version which includes registers to reset the
memory context that is in-line with a recently committed patch at [1].
[1] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=cfd6cbcf9be078fbdf9587014231a5772039b386

Regards,
Vignesh
From abf964263e5de78d44f6235f90789cfe5143b534 Mon Sep 17 00:00:00 2001
From: Vignesh 
Date: Thu, 26 Dec 2024 11:37:43 +0530
Subject: [PATCH v2] Fix memory leak in pgoutput relation attribute map

The pgoutput module caches relation attribute map only when validating the
RelationSyncEntry.  However, this was not getting freed incase of SQL functions
like pg_logical_slot_{get,peek}_changes() which would bloat its cache memory usage.
Every pg_logical_slot_{get,peek}_changes() call for changes on partition table
creates more bloat. To address this, this relation attribute map is allocated in
the plugin's cachectx which will be freed while the context is freed at
the end of pg_logical_slot_{get,peek}_changes() execution.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6db780d733..6c19cc1bb2 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1197,8 +1197,8 @@ init_tuple_slot(PGOutputData *data, Relation relation,
 		TupleDesc	indesc = RelationGetDescr(relation);
 		TupleDesc	outdesc = RelationGetDescr(ancestor);
 
-		/* Map must live as long as the session does. */
-		oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+		/* Map must live as long as the logical decoding context. */

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-12-26 Thread Michail Nikolaev
Hello, Hou!

> So, I think it's not a bug in the committed patch but an issue in the
testing
>  venvironment. Besides, since we have not seen such failures on BF, I
think it
>  may not be necessary to improve the testcases.

Thanks for your analysis!
Yes, probably WSL2/Windows interactions cause strange system clock moving.
It looks like it is a common issue with WSL2 [0].

[0]: https://github.com/microsoft/WSL/issues/10006

Best regards,
Mikhail.


Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-12-26 Thread Bruce Momjian
On Mon, Dec 23, 2024 at 01:56:24PM +0900, Michael Paquier wrote:
> On Thu, Dec 12, 2024 at 03:40:32PM +0800, jian he wrote:
> > remove pg_type typndims column patch attached.
> 
> FWIW, I have been paying more attention to applications that may use
> this attribute and bumped into quite a few cases that use quals based
> on (t.typndims > 0 AND t.typbasetype > 0) to check that they're
> dealing with domains over array types.  So even removing this switch
> would be unwise, I am afraid..

Well, I would be more excited about keeping the fields if they actually
were reliable in recording information.  This email from November 2023
explains the limitations of attndims:

   
https://www.postgresql.org/message-id/flat/20150707072942.1186.98...@wrigleys.postgresql.org

*  dimensions not dumped by pg_dump
*  dimensions not propagated by CREATE TABLE ... (LIKE)
*  dimensions not propagated by CREATE TABLE AS
*  dimensions not displayed by psql

So, if users are referencing attndims and the values are not accurate,
how useful are they?  I was suggesting removal so people would stop
relying on inaccurate information, or we correct attndims to be
accurate.  Allowing people to continue relying on inaccurate information
seems like the worst of all options.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.






Re: Add XMLNamespaces to XMLElement

2024-12-26 Thread Jim Jones
Hi Umar, hi Pavel,

Thanks for taking a look at this patch!

On 26.12.24 05:15, Umar Hayat wrote:
> Hi,
> +1 for the enhancement.
>
> I haven't compiled and reviewed the full patch yet, please see a few
> comments from my side based on static analysis.
>
> 1. Though changes are targeted for XMLNAMESPACES for XMLElement but in
> my opinion it will affect XMLTABLE as well because the
> 'xml_namespace_list' rule is shared now.
> Adding 'NO DEFAULT' in xml_namespace_list will allow users to use it
> with XMLTABLE XMLNAMESPACES as well.PostgreSQL grammar allow to
> specify DEFAULT in NAMESPACE but resulting in following error:
> "ERROR:  DEFAULT namespace is not supported"

I also considered creating a new rule to avoid any conflict with
XMLTable, but as it didn't break any regression test and the result
would be pretty much the same as with "DEFAULT 'str'", I thought that
extending the existing rule would be the way to go.

SELECT * FROM XMLTABLE(XMLNAMESPACES(NO DEFAULT),
  '/rows/row'
  PASSING 'http://x.y";>10'
  COLUMNS a int PATH 'a');
ERROR:  DEFAULT namespace is not supported

What do you think?


> What would be behavior with this change for XMLTABLE, should this be
> allowed and the error messages need to be updated (may be this will
> not be an error at all) or we need to restrict users to not use 'NO
> DEFAULT' with XMLTable.


Perhaps updating the error message would suffice?


>
> 2. Should we reuse the 'xml_namespaces' rule for XMLTable, as the
> definition is the same.


That would be good. I'm just afraid it would deviate a bit from the
scope of this patch - here I mean touching other function. Would you
suggest to add it to a patch series?

> 3. In this patch 'NO DEFAULT' behavior is like DEFAULT ''
> (empty uri) , should not it be more like 'DEFAULT NULL' to result in
> the following ?
> SELECT xmlelement(NAME "root", xmlnamespaces(NO DEFAULT));
> xmlelement
> --
> 
>
> instead of
>
> SELECT xmlelement(NAME "root", xmlnamespaces(NO DEFAULT));
> xmlelement
> --
> 
>
The idea of NO DEFAULT is pretty much to free an element (and its
children) from a previous DEFAULT in the same scope.

SELECT
  xmlserialize(DOCUMENT
    xmlelement(NAME "root",
  xmlnamespaces(DEFAULT 'http:/x.y/ns1'),
  xmlelement(NAME "foo",
    xmlnamespaces(NO DEFAULT))
  ) AS text INDENT);

 xmlserialize     
--
 +
      +
 
(1 row)


I believe this behaviour might be confusing if NO DEFAULT is used in the
root element, as it there is no previously declared namespace. Perhaps
making NO DEFAULT behave like DEFAULT NULL only in the root element
would make things clearer? The SQL/XML spec doesn't say anything
specific about it, but DB2 had the same thought[1]. For reference, here
are the regress tests[2] of this patch tested with the DB2 implementation.

> On Sat, 21 Dec 2024 at 14:57, Pavel Stehule  wrote:
>> +1
>>
>> Pavel

rebase in v2 attached - due to changes in gram.y

Thanks a lot

Best, Jim

1 - https://dbfiddle.uk/0QsWlfZR
2 - https://dbfiddle.uk/SyiDfXodFrom 93422b076b2d270ca9293cf4e0be3da62bb6af1c Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 26 Dec 2024 14:40:26 +0100
Subject: [PATCH v2] Add XMLNamespaces option to XMLElement

This patch adds the scoped option XMLNamespaces to XMLElement,
as described in ISO/IEC 9075-14:2023, 11.2 XML lexically scoped
options:

xmlnamespaces(uri AS prefix, ...)
xmlnamespaces(DEFAULT uri, ...)
xmlnamespaces(NO DEFAULT, ...)

* prefix: Namespace's prefix.
* uri:Namespace's URI.
* DEFAULT prefix: Specifies the DEFAULT namespace to use within
the scope of a namespace declaration.
* NO DEFAULT: Specifies that NO DEFAULT namespace is to be
used within the scope of a namespace declaration.

== Examples ==

SELECT xmlelement(NAME "foo", xmlnamespaces('http:/x.y' AS bar));
  xmlelement
--
 

SELECT xmlelement(NAME "foo", xmlnamespaces(DEFAULT 'http:/x.y'));
xmlelement
--
 

 SELECT xmlelement(NAME "foo", xmlnamespaces(NO DEFAULT));
   xmlelement
-
 

Tests and documentation were updated accordingly.
---
 doc/src/sgml/func.sgml  |  57 +++-
 src/backend/parser/gram.y   |  73 +++---
 src/backend/parser/parse_expr.c |  73 ++
 src/backend/utils/adt/xml.c |  32 -
 src/include/nodes/primnodes.h   |   4 +-
 src/include/utils/xml.h |   6 +
 src/test/regress/expected/xml.out   | 205 
 src/test/regress/expected/xml_1.out | 151 
 src/test/regress/expected/xml_2.out | 205 
 src/test/regress/sql/xml.sql| 100 ++
 10 files changed, 884 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..b33663bc09 100644
--- a/doc/src/

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread wenhui qiu
Hi

As far as I know, more than 10,000 tables  of instances  are often
encountered,
So I insist that the maximum can be appropriately increased to 256MB,Can be
more adaptable to many table situations

On Thu, 26 Dec 2024 at 23:23, Robert Treat  wrote:

> On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra  wrote:
> >
> > Hi,
> >
> > On 12/23/24 07:35, wenhui qiu wrote:
> > > Hi Tomas
> > >  This is a great feature.
> > > + /*
> > > + * Define (or redefine) custom GUC variables.
> > > + */
> > > + DefineCustomIntVariable("stats_history.size",
> > > + "Sets the amount of memory available for past events.",
> > > + NULL,
> > > + &statsHistorySizeMB,
> > > + 1,
> > > + 1,
> > > + 128,
> > > + PGC_POSTMASTER,
> > > + GUC_UNIT_MB,
> > > + NULL,
> > > + NULL,
> > > + NULL);
> > > +
> > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can
> > > increase to store more history record ?
> > >
> >
> > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough
> > for ~500k events, which seems good enough for most systems. Of course,
> > on systems with many relations might need more space, not sure.
> >
> > I was thinking about specifying the space in more natural terms, either
> > as amount of time ("keep 1 day of history") or number of entries ("10k
> > entries"). That would probably mean the memory can't be allocated as
> > fixed size.
> >
>
> Based on the above, a rough calculation is that this is enough for
> holding 1 year of hourly vacuum runs for 50 tables, or a year of daily
> vacuums for 1000 tables. Most folks will fall somewhere in that range
> (and won't really need a year's history) but that seems like plenty
> for a default.
>
> > But maybe it'd be possible to just write the entries to a file. We don't
> > need random access to past entries (unlike e.g. pg_stat_statements), and
> > people won't query that very often either.
> >
>
> Yeah, workloads will vary, but it doesn't seem like they would more
> than query workloads do.
>
> Robert Treat
> https://xzilla.net
>


Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread Tomas Vondra
On 12/26/24 17:00, wenhui qiu wrote:
> Hi 
>    
> As far as I know, more than 10,000 tables  of instances  are often
> encountered,
> So I insist that the maximum can be appropriately increased to 256MB,
> Can be more adaptable to many table situations
> 

If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
make a fundamental difference ...

Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
limit, or possibly removing it entirely (and accepting anything the
system can handle).


regards

-- 
Tomas Vondra





Re: Add Postgres module info

2024-12-26 Thread David Wheeler
On Mon, Dec 23, 2024, at 8:49 PM, Andrei Lepikhov wrote:

> Looking into the control file, I see that most parameters are 
> unnecessary for the library. Why do we have to maintain this file?

Most of the parameters apply to SQL extensions.

> The 'CREATE EXTENSION' statement would have made sense if we had 
> register/unregister hook machinery. Without that, it seems it is just 
> about maintaining the library's version and comments locally in a 
> specific database.

Well, either way you have to load the extension, either CREATE EXTENSION to 
load an SQL extension (and any related shared modules), or LOAD or 
*_preload_libraries to load a shared module. I propose to add support for 
shared-module-only extensions to CREATE/UPDATE/DROP EXTENSION. It would then 
both insert the version info in the database (from the control file, at least), 
and load the shares module(s).

> It would be interesting to read about your real-life cases that caused 
> your proposal.

They're in the first section of [1]. The desire to group all the files for an 
extension in a single directory led to a conflict with the exiting LOAD 
patterns, which in the final section of [1] I attempt to resolve by proposing a 
single way to manage *all* extensions, instead of the two separate patterns we 
have today.

Best,

David

[1]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/




Re: An improvement of ProcessTwoPhaseBuffer logic

2024-12-26 Thread Давыдов Виталий
Hi Michael,

> Here are two patches to address both issues:

Thank you for the preparing the patches and test simplification. My bad, I 
overcompilcated it.

I concerned about twophase file name generation. The function 
TwoPhaseFilePath() is pretty straitforward and unambiguous in 16 and earlier 
versions. The logic of this function in 17+ seems to be more complex. I do not 
understand it clearly. But, I guess, it will work incorrectly after turning to 
a newer epoch, because the epoch is calculated from TransamVariables->nextXid, 
but not from problem xid. The same problem may happen if we are in epoch 1 or 
greater. It will produce a wrong file name, because the epoch will be obtained 
from the last xid, not from file name xid. In another words, the function 
AdjustToFullTransactionId assumes that if xid > TransamVariables->nextXid, then 
the xid from the previous epoch. I may be not the case in our scenario.

> I suspect that this can be still dangerous as-is while complicating the code 
> with more possible paths for the removal of the 2PC files
Agree, but we may pass file name into TwoPhaseFilePath if any, instead of 
creation of two functions as in the patch. The cost - two new if conditions. 
Using file names is pretty safe. Once we read the file and extract xid from its 
name, just pass this file name to TwoPhaseFilePath(). If not, try to generate 
it. Anyway, I do not insist on it, just try to discuss.

With best regards,
Vitaly





Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread Robert Treat
On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra  wrote:
>
> Hi,
>
> On 12/23/24 07:35, wenhui qiu wrote:
> > Hi Tomas
> >  This is a great feature.
> > + /*
> > + * Define (or redefine) custom GUC variables.
> > + */
> > + DefineCustomIntVariable("stats_history.size",
> > + "Sets the amount of memory available for past events.",
> > + NULL,
> > + &statsHistorySizeMB,
> > + 1,
> > + 1,
> > + 128,
> > + PGC_POSTMASTER,
> > + GUC_UNIT_MB,
> > + NULL,
> > + NULL,
> > + NULL);
> > +
> > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can
> > increase to store more history record ?
> >
>
> Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough
> for ~500k events, which seems good enough for most systems. Of course,
> on systems with many relations might need more space, not sure.
>
> I was thinking about specifying the space in more natural terms, either
> as amount of time ("keep 1 day of history") or number of entries ("10k
> entries"). That would probably mean the memory can't be allocated as
> fixed size.
>

Based on the above, a rough calculation is that this is enough for
holding 1 year of hourly vacuum runs for 50 tables, or a year of daily
vacuums for 1000 tables. Most folks will fall somewhere in that range
(and won't really need a year's history) but that seems like plenty
for a default.

> But maybe it'd be possible to just write the entries to a file. We don't
> need random access to past entries (unlike e.g. pg_stat_statements), and
> people won't query that very often either.
>

Yeah, workloads will vary, but it doesn't seem like they would more
than query workloads do.

Robert Treat
https://xzilla.net




Bug in v13 due to "Fix corruption when relation truncation fails."

2024-12-26 Thread Yura Sokolov

Good day.

Commit "Fix corruption when relation truncation fails." [0] makes 
smgrtruncate be called in a critical section. Unfortunately in version 
13 it leads to occasional call to palloc0 inside of critical section, 
which triggers Assert in debug builds:


- smgrtruncate calls mdtruncate
- mdtruncate may call register_dirty_segment
- register_dirty_segment calls RegisterSyncRequest
- RegisterSyncRequest calls ForwardSyncRequest
- ForwardSyncRequest may call CompactCheckpointerRequestQueue
- CompactCheckpointerRequestQueue may call palloc0 ...

In versions 14 and above CompactCheckpointerRequestQueue does check for 
critical section due to commit "Fix bugs in MultiXact truncation" [1],

which were backported down to 14 version, but not 13.

We caught it in our private tests, so it is real.
Cherry-pick of [1] from 14 version to 13 solves the issue.

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;h=2280912165d
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;h=4c8e00ae9ae

--

regards,
Yura Sokolov aka funny-falcon




Re: Statistics Import and Export

2024-12-26 Thread Bruce Momjian
On Thu, Dec 19, 2024 at 09:23:20PM -0800, Jeff Davis wrote:
> On Fri, 2024-12-13 at 00:22 -0500, Corey Huinker wrote:
> > Per offline conversation with Jeff, adding a --no-schema to pg_dump
> > option both for completeness (we already have --no-data and --no-
> > statistics), but users who previously got the effect of --no-schema
> > did so by specifying --data-only, which suppresses statistics as
> > well.
> > 
> > 0001-0005 - changes to pg_dump/pg_upgrade
> 
> Attached is a version 36j where I consolidated these patches and
> cleaned up the documentation. It doesn't make a lot of sense to commit
> them separately, because as soon as the pg_dump changes are there, the
> pg_upgrade test starts showing a difference until it starts using the -
> -no-data option.
> 
> The biggest functional change is the way dependencies are handled for
> matview stats. Materialized views ordinarily end up in
> SECITON_PRE_DATA, but in some cases they can be postponed to
> SECTION_POST_DATA. You solved that by always putting the matview stats
> in SECTION_POST_DATA.
> 
> I took a different approach here and, when the matview is postponed,
> also postpone the matview stats. It's slightly more code, but it felt
> closer to the rest of the structure, where postponing is a special case
> (that we might be able to remove in the future).

I am confused by this:

Add options --with-statistics/--no-statistics to pg_upgrade
to enable/disable transferring of statistics to the upgraded
cluster. The default is --with-statistics.

If statistics is the default for pg_upgrade, why would we need a
--with-statistics option?

Also, I see a misspelling:

+   printf(_("  --no-statisttics  do not import statistics from 
old cluster\n"));
   --

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.






Re: Statistics Import and Export

2024-12-26 Thread Bruce Momjian
On Wed, Dec 11, 2024 at 10:49:53PM -0500, Corey Huinker wrote:
> From cf4e731db9ffaa4e89d7c5d14b32668529c8c89a Mon Sep 17 00:00:00 2001
> From: Corey Huinker 
> Date: Fri, 8 Nov 2024 12:27:50 -0500
> Subject: [PATCH v34 11/11] Add --force-analyze to vacuumdb.
> 
> The vacuumdb options of --analyze-in-stages and --analyze-only are often
> used after a restore from a dump or a pg_upgrade to quickly rebuild
> stats on a databse.
> 
> However, now that stats are imported in most (but not all) cases,
> running either of these commands will be at least partially redundant,
> and will overwrite the stats that were just imported, which is a big
> POLA violation.
> 
> We could add a new option such as --analyze-missing-in-stages, but that
> wouldn't help the userbase that grown accustomed to running
> --analyze-in-stages after an upgrade.
> 
> The least-bad option to handle the situation is to change the behavior
> of --analyze-only and --analyze-in-stages to only analyze tables which
> were missing stats before the vacuumdb started, but offer the
> --force-analyze flag to restore the old behavior for those who truly
> wanted it.

I am _again_ not happy with this part of the patch.  Please reply to the
criticism in my November 19th email:

https://www.postgresql.org/message-id/zz0t1benifdnx...@momjian.us

rather than ignoring it and posting the same version of the patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.






Re: Removing unused parameter in compute_expr_stats

2024-12-26 Thread David Rowley
On Thu, 26 Dec 2024 at 22:41, Ilia Evdokimov
 wrote:
> I'm sharing a small patch that removes an unused parameter (totalrows)
> from compute_expr_stats function in extended_stats.c .

Thanks. It's a static function, so I agree that there's no need to
keep an unused parameter.

Pushed.

David




Re: Add Postgres module info

2024-12-26 Thread Andrei Lepikhov

On 12/27/24 01:26, David Wheeler wrote:

On Mon, Dec 23, 2024, at 8:49 PM, Andrei Lepikhov wrote:


Looking into the control file, I see that most parameters are
unnecessary for the library. Why do we have to maintain this file?

Well, either way you have to load the extension, either CREATE EXTENSION to 
load an SQL extension (and any related shared modules), or LOAD or 
*_preload_libraries to load a shared module. I propose to add support for 
shared-module-only extensions to CREATE/UPDATE/DROP EXTENSION. It would then 
both insert the version info in the database (from the control file, at least), 
and load the shares module(s).

I still can't get your point.
We intentionally wrote a library, not an extension. According to user 
usage and upgrade patterns, it works across the whole instance and in 
any database or locally in a single backend and ends its impact at the 
end of its life.

Also, it doesn't maintain any object in the database and is managed by GUCs.
For example, my libraries add query tree transformations/path 
recommendations to the planner. It doesn't depend on a database and 
doesn't maintain DSM segments and users sometimes want to use it in 
specific backends, not databases - in a backend dedicated to analytic 
queries without extra overhead to backends, picked out for short 
queries. For what reason do I need to add complexity and call 'CREATE 
EXTENSION' here and add version info only in a specific database? Just 
because of a formal one-directory structure?


--
regards, Andrei Lepikhov




RE: Windows meson build

2024-12-26 Thread Kohei Harikae (Fujitsu)
Hi,

Thank you for your advice.
I added a sample setting PKG_CONFIG_PATH based on your advice.

How do you think about this?

Regards,
Kohei Harikae

-Original Message-
From: Vladlen Popolitov  
Sent: Wednesday, December 25, 2024 7:28 PM
To: Harikae, Kohei/張替 浩平 
Cc: 'Andres Freund' ; 'pgsql-hackers@lists.postgresql.org' 

Subject: Re: Windows meson build

Kohei Harikae (Fujitsu) писал(а) 2024-12-18 05:05:
Hi!
> Hi,
> 
> I apologize for this late reply.
> 
> As you advised, I was able to create the ".pc" file manually and set 
> the dependencies using pkgconf.
> Thank you.
> 
> However, I have read the PostgreSQL Documentation and did not realize 
> that pkgconf is required.
> 
> 17.4.3.2. PostgreSQL Features
> -Dgssapi
>   On many systems, the GSSAPI system (a part of the MIT Kerberos
> installation) is not installed in a location that is searched by 
> default (e.g., /usr/include, /usr/lib).
>   In those cases, PostgreSQL will query pkg-config to detect the 
> required compiler and linker options.
> -Dlibxml
>   To use a libxml2 installation that is in an unusual location, you 
> can set pkg-config-related environment variables (see its documentation).
> 
> Only a few options have a pkg-config description.
> It would be easier to understand pkg-config in the "17.4.3. meson 
> setup Options" section.
> Below is my proposal for the document description.
> How do you think about this?
> 
> 17.4.3. meson setup Options
> PostgreSQL will query pkg-config to detect the required compiler and 
> linker options.
> To use module installed in an unusual location for each feature (e.g. 
> -Dgssapi, -Dlibxml), set pkg-config-related environment variables (see 
> its documentation).

Meson has more straighforward solution. It uses environment variable 
PKG_CONFIG_PATH, that can store the list of directories with all .pc files. 
Meson uses this list and in this specific order to build project. 
As wrote in meson documentation (in the source of open source project):
# Instead of relying on pkg-config or pkgconf to provide -L flags in a # 
specific order, we reorder library paths ourselves, according to th # order 
specified in PKG_CONFIG_PATH. See:
# https://github.com/mesonbuild/meson/issues/4271
#
It would be more clear information in PostgreSQL documentation, that all 
directories with pc files are listed in one variable.

By the way, if you need to add , f.e. libxml library to postgreSQL build, you 
can
1) build and install it using , f.e. vcpkg:
  vcpkg install libxml2:windows-x64 --x-install-root=c:\pbuild\libxml2
2) add path to pc directory to PKG_CONFIG_PATH
  SET
PKG_CONFIG_PATH=c:\pbuild\libxml2\winsows-x64\lib\pkgconfig;%PKG_CONFIG_PATH%
3) make steps with build
  meson setup c:\builddir\... --prefix=c:\postgresdir\...

It is just one install and one manipulation with environment variable. 
No indeterminate behaviour with other variables used by specific packages and a 
libraries order.

> 
> Regards,
> Kohei Harikae
> 
> -Original Message-
> From: Andres Freund 
> Sent: Thursday, November 7, 2024 4:05 AM
> To: Harikae, Kohei/張替 浩平 
> Cc: 'pgsql-hackers@lists.postgresql.org' 
> 
> Subject: Re: Windows meson build
> 
> Hi,
> 
> On 2024-11-05 06:32:51 +, Kohei Harikae (Fujitsu) wrote:
>> I do not use pkgconf in my Windows environment.
>> In my Windows environment, I could not build the following OSS with 
>> meson.
>> - 0001 icu
>> - 0002 libxml
>> - 0003 libxslt
>> - 0004 lz4
>> - 0005 tcl
>> - 0006 zlib
>> - 0007 zstd
>> 
>> [1]thread, I created a patch like the one in the attached file, and 
>> now I can build.
>> Would you like to be able to build OSS with Windows meson without 
>> using pkgconf?
> 
> You can use pkgconf or cmake for the dependencies.  I don't want to 
> add "raw"
> dependency handling for every dependency, they each build in too many 
> variants for that to be a sensible investment of time.
> 
> 
> Greetings,
> 
> Andres Freund

--
Best regards,

Vladlen Popolitov.


installation-meson.patch
Description: installation-meson.patch


Re: Using Expanded Objects other than Arrays from plpgsql

2024-12-26 Thread Tom Lane
Michel Pelletier  writes:
> On Mon, Dec 23, 2024 at 8:26 AM Tom Lane  wrote:
>> 2. If the problem is primarily with passing the same object to
>> multiple parameters of a function, couldn't you detect and optimize
>> that within the function?

> Ah that's a great idea, and it works beautifully!  Now I can do an
> efficient triangle count without even needing a function, see below
> expand_matrix is only called once:

Nice!

> That pretty much resolves my main issues.  I'm still in an exploratory
> phase but I think this gets me pretty far.  Is this something that has to
> wait for 18 to be released?  Also do you need any further testing or code
> reviewing from me?

Yeah, this is not the kind of change we would back-patch, so it'll
have to wait for v18 (or later if people are slow to review it :-( ).

I don't think there's anything more we need from you, though of course
you should keep working on your code and report back if you hit
anything that needs further improvement on our end.

regards, tom lane




WAL-logging facility for pgstats kinds

2024-12-26 Thread Michael Paquier
Hi all,

While brainstorming on the contents of the thread I have posted a
couple of days ago, I have been looking at what could be done so as
pgstats and WAL-logging could work together.  This was point 2) from
this message:
https://www.postgresql.org/message-id/z2tblemfuofzy...@paquier.xyz

While considering the point that not all stats data is worth
replicating, I have fallen down to the following properties that are
nice to have across the board: 
- A pgstats kind should be able to WAL-log some data that is should be
able to decide.  Including versioning of the data.
- The data kind should be able to decide how this data is handled at
recovery (different persistency depending on standby vs crash
recovery, for one).
- Having to add one RMGR for each stats kind is not going to scale,
especially as we can control the redo path using a callback part of
PgStat_KindInfo.  For custom kinds, this enforces some validation
based on if the stats library has been really loaded at startup with
shared_preload_library.
- It is nice for each out-of-core stats kind to not have to load and
define a custom RMGR, duplicating what this central facility is doing.
- The persistency of the stats data is better across crashes: this
approach makes the cleanup of orphaned pgstats entries easier as this
can be enforced at replay by each stats kind, if necessary, and it
is also possible to force the creation of stats data.
- For variable-numbered stats, the WAL-logging can be timed with the
timing where any pending stats are flushed, if there is some.

Table stats to prevent autovacuum from committing Seppuku for tables
without stats after a crash is the first use case I can think about,
where we'd want to split and expose some DML stats.  Scans stats
should remain untouched on standbys, for example.  There may be other
stats data that is worth replicating, all these could use this new
facility.

Attached is the result of this investigation, where I have finished by
implementing a new backend facility where pgstats kinds can WAL-log
data at will using a new RMGR called "PgStat".  The most interesting
part is pgstat_xlog_data() in a new file called pgstat_xlog.c, that
stats kinds can use as an entry point to WAL-log data structures with
a XLogRegisterData() managed by a new record structure called
xl_pgstat_data.

For clarity, the patch set has been split into several pieces, I hope
rather edible:
- 0001, a fix I've posted on a different thread [1], used in patch
0004 to test this new facility.
- 0002, a refactoring piece to be able to expose stats kind data into
the frontend (for pg_waldump).
- 0003 is the core backend piece, introducing the WAL-logging routine
and the new RMGR.
- 0004, that provides tests for the new backend facility via a custom
stats kind.  Requires 0001.

I am going to attach it to the next CF.

Comments or opinions are welcome.

[1]: https://www.postgresql.org/message-id/z23zce4w1djuk...@paquier.xyz
--
Michael
From dc7f5ac52407c8916a560572cfb2c0adf183dbab Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 27 Dec 2024 09:16:02 +0900
Subject: [PATCH v1 1/4] injection_points: Tweak variable-numbered stats to
 work with pending data

As coded, the module was not using pending entries to store its data
locally before a flush to the central dshash is done with a timed
pgstat_report_stat() call.  Hence, the flush callback was defined, but
finished by being not used.  As a template, this is more efficient than
the original logic of updating directly the shared memory entries as
this reduces the interactions that need to be done with the pgstats
hash table in shared memory.

injection_stats_flush_cb() was also missing a pgstat_unlock_entry(), so
add one, while on it.
---
 .../modules/injection_points/injection_stats.c| 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index e16b9db284..21d5c10f39 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -80,6 +80,9 @@ injection_stats_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 		return false;
 
 	shfuncent->stats.numcalls += localent->numcalls;
+
+	pgstat_unlock_entry(entry_ref);
+
 	return true;
 }
 
@@ -127,13 +130,13 @@ pgstat_create_inj(const char *name)
 	if (!inj_stats_loaded || !inj_stats_enabled)
 		return;
 
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid,
-			PGSTAT_INJ_IDX(name), false);
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid,
+		  PGSTAT_INJ_IDX(name), NULL);
+
 	shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats;
 
 	/* initialize shared memory data */
 	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
-	pgstat_unlock_entry(entry_ref);
 }
 
 /*
@@ -168,16 +171,14 @@ pgstat_report_inj(const char *name)
 	if (!inj_stats_loaded || !inj_

Re: An improvement of ProcessTwoPhaseBuffer logic

2024-12-26 Thread Michael Paquier
On Thu, Dec 26, 2024 at 06:11:25PM +0300, Давыдов Виталий wrote:
> I concerned about twophase file name generation. The function
> TwoPhaseFilePath() is pretty straitforward and unambiguous in 16 and
> earlier versions. The logic of this function in 17+ seems to be more
> complex. I do not understand it clearly. But, I guess, it will work
> incorrectly after turning to a newer epoch, because the epoch is
> calculated from TransamVariables->nextXid, but not from problem
> xid. The same problem may happen if we are in epoch 1 or greater. It
> will produce a wrong file name, because the epoch will be obtained
> from the last xid, not from file name xid. In another words, the
> function AdjustToFullTransactionId assumes that if xid >
> TransamVariables->nextXid, then the xid from the previous epoch. I
> may be not the case in our scenario.

Yeah.  At this stage, we can pretty much say that the whole idea of
relying AdjustToFullTransactionId() is broken, because we would build
2PC file names based on wrong assumptions, while orphaned files could
be in the far past or far future depending on the epoch.

TBH, we'll live better if we remove AdjustToFullTransactionId() and
sprinkle a bit more the knowledge of FullTransactionIds to build
correctly the 2PC file path in v17~.  I've been playing with this code
for a couple of hours and finished with the attached patch.  I have
wondered if ReadTwoPhaseFile() should gain the same knowledge as
TwoPhaseFilePath(), but decided to limit the invasiness because we
always call ReadTwoPhaseFile() once we don't have any orphaned files,
for all the phases of recovery.  So while this requires a bit more
logic that depends on FullTransactionIdFromEpochAndXid() and
ReadNextFullTransactionId() to build a FullTransactionId and get the
current epoch, that's still acceptable as we only store an XID in the
2PC file.

So please see the attached.  You will note that RemoveTwoPhaseFile(),
ProcessTwoPhaseBuffer() and TwoPhaseFilePath() now require a
FullTransactionId, hence callers need to think about the epoch to use.
That should limit future errors compared to passing the file name as
optional argument.

What do you think?
--
Michael
From bb250c0cd303f0d8db0fee2d26353bf0e0cb1275 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 27 Dec 2024 16:34:44 +0900
Subject: [PATCH v2] Fix failure with incorrect epoch handling for 2PC files at
 recovery

At the beginning of recovery, an orphaned two-phase file in an epoch
different than the one defined in the checkpoint record could not be
removed based on the assumption that AdjustToFullTransactionId() relies
on, assuming that all files would be either from the current epoch for
the epoch before that.

If the checkpoint epoch was 0 while the 2PC file was orphaned and in the
future, AdjustToFullTransactionId() would underflow the epoch used to
build the 2PC file path.  In non-assert builds, this would create a
WARNING message referring to a 2PC file with an epoch of "" (or
UINT32_MAX), as an effect of the underflow calculation.

Some tests are added with dummy 2PC files in the past and the future,
checking that these are properly removed.

Issue introduced by 5a1dfde8334b.

Reported-by: Vitaly Davydov
Discussion: https://postgr.es/m/13b5b6-676c3080-4d-531db900@47931709
Backpatch-through: 17
---
 src/backend/access/transam/twophase.c | 231 +-
 src/test/recovery/t/009_twophase.pl   |  31 
 2 files changed, 184 insertions(+), 78 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 49be1df91c..01f5d728be 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -221,13 +221,13 @@ static void ProcessRecords(char *bufptr, TransactionId xid,
 static void RemoveGXact(GlobalTransaction gxact);
 
 static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len);
-static char *ProcessTwoPhaseBuffer(TransactionId xid,
+static char *ProcessTwoPhaseBuffer(FullTransactionId xid,
    XLogRecPtr prepare_start_lsn,
    bool fromdisk, bool setParent, bool setNextXid);
 static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid,
 const char *gid, TimestampTz prepared_at, Oid owner,
 Oid databaseid);
-static void RemoveTwoPhaseFile(TransactionId xid, bool giveWarning);
+static void RemoveTwoPhaseFile(FullTransactionId fxid, bool giveWarning);
 static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len);
 
 /*
@@ -926,42 +926,9 @@ TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 /* State file support	*/
 //
 
-/*
- * Compute the FullTransactionId for the given TransactionId.
- *
- * The wrap logic is safe here because the span of active xids cannot exceed one
- * epoch at any given time.
- */
-static inline FullTransactionId
-AdjustToFullTransactionId(TransactionId xid)
-{
-

Re: Add Postgres module info

2024-12-26 Thread Andrei Lepikhov

On 12/24/24 10:42, Yurii Rashkovskii wrote:
On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov I've reviewed the patch, and it is great that you support more flexible 
versioning now. I am just wondering a bit about the case where `minfo- 
 >name` can be `NULL` but `minfo->version` isn't, or where both are 
`NULL` – should we skip any of these?
Depends. I wrote code that way so as not to restrict a maintainer by 
initialising all the fields; remember, it may grow in the future.
But I am open to changing that logic. Do you have any specific rule on 
which fields may be empty and that must be initialised? Do you think all 
fields maintainer must fill with non-zero-length constants?


Also, I've added this patch to commitfest:
https://commitfest.postgresql.org/51/5465/

--
regards, Andrei Lepikhov




Re: date_trunc invalid units with infinite value

2024-12-26 Thread Michael Paquier
On Tue, Dec 24, 2024 at 04:33:47PM +0900, Michael Paquier wrote:
> I am planning to get this one applied around the end of this week on
> Friday for HEAD, that should be enough if there are comments and/or
> objections.

And done for now.  If there are any remarks and/or objections, of
course feel free.
--
Michael


signature.asc
Description: PGP signature


Re: Add Postgres module info

2024-12-26 Thread Yurii Rashkovskii
On Fri, Dec 27, 2024 at 8:34 AM Andrei Lepikhov  wrote:

> On 12/24/24 10:42, Yurii Rashkovskii wrote:
> > On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov  > I've reviewed the patch, and it is great that you support more flexible
> > versioning now. I am just wondering a bit about the case where `minfo-
> >  >name` can be `NULL` but `minfo->version` isn't, or where both are
> > `NULL` – should we skip any of these?
> Depends. I wrote code that way so as not to restrict a maintainer by
> initialising all the fields; remember, it may grow in the future.
> But I am open to changing that logic. Do you have any specific rule on
> which fields may be empty and that must be initialised? Do you think all
> fields maintainer must fill with non-zero-length constants?


After more thinking, I'll concede that not doing anything about null
metadata is probably better – making the function always return the list of
modules, regardless of whether any metadata was supplied. It's beneficial
to be able to get the entire list of modules regardless of metadata.

The only other minor concern I have left is that some modules might have a
clashing name or may change the name during the extension's lifetime
(happened to some of my early work). Providing a permanent identifier and a
human-digestible identifier may be worth it.


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-12-26 Thread vignesh C
On Tue, 24 Dec 2024 at 17:07, Nisha Moond  wrote:
>
> On Fri, Dec 20, 2024 at 3:12 PM Amit Kapila  wrote:
>>
>> On Mon, Dec 16, 2024 at 4:10 PM Nisha Moond  wrote:
>> >
>> > Here is the v56 patch set with the above comments incorporated.
>> >
>>
>> Review comments:
>> ===
>> 1.
>> + {
>> + {"idle_replication_slot_timeout", PGC_SIGHUP, REPLICATION_SENDING,
>> + gettext_noop("Sets the duration a replication slot can remain idle before "
>> + "it is invalidated."),
>> + NULL,
>> + GUC_UNIT_MS
>> + },
>> + &idle_replication_slot_timeout_ms,
>>
>> I think users are going to keep idele_slot timeout at least in hours.
>> So, millisecond seems the wrong choice to me. I suggest to keep the
>> units in minutes. I understand that writing a test would be
>> challenging as spending a minute or more on one test is not advisable.
>> But I don't see any test testing the other GUCs that are in minutes
>> (wal_summary_keep_time and log_rotation_age). The default value should
>> be one day.
>
>
> +1
> - Changed the GUC unit to "minute".
>
> Regarding the tests, we have two potential options:
>  1) Introduce an additional "debug_xx" GUC parameter with units of seconds or 
> milliseconds, only for testing purposes.
>  2) Skip writing tests for this, similar to other GUCs with units in minutes.
>
> IMO, adding an additional GUC just for testing may not be worthwhile. It's 
> reasonable to proceed without the test.
>
> Thoughts?
>
> The attached v57 patch-set addresses all the comments. I have kept the test 
> case in the patch for now, it takes 2-3 minutes to complete.

Few comments:
1) We have disabled the similar configuration max_slot_wal_keep_size
by setting to -1, as this GUC also is in similar lines, should we
disable this and let the user configure it?
+/*
+ * Invalidate replication slots that have remained idle longer than this
+ * duration; '0' disables it.
+ */
+intidle_replication_slot_timeout_min =
HOURS_PER_DAY * MINS_PER_HOUR;
+

2) I felt this behavior is an existing behavior, so this can also be
moved to 0001 patch:
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index a586156614..199d7248ee 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2566,7 +2566,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
   
 The time when the slot became inactive. NULL if the
-slot is currently being streamed.
+slot is currently being streamed. If the slot becomes invalidated,
+this value will remain unchanged until server shutdown.


3) Can we change the comment below to "We don't allow the value of
idle_replication_slot_timeout other than 0 during the binary upgrade.
See start_postmaster() in pg_upgrade for more details.":
+ * The idle_replication_slot_timeout must be disabled (set to 0)
+ * during the binary upgrade.
+ */
+bool
+check_idle_replication_slot_timeout(int *newval, void **extra,
GucSource source)

Regards,
Vignesh




Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread Michael Paquier
On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote:
> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
> 
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).

+DefineCustomIntVariable("stats_history.size",
+"Sets the amount of memory available for past 
events.",

How about some time-based retention?  Data size can be hard to think
about for the end user, while there would be a set of users that would
want to retain data for the past week, month, etc?  If both size and
time upper-bound are define, then entries that match one condition or
the other are removed.

+checkpoint_log_hook(
+  CheckpointStats.ckpt_start_t,/* start_time */
+  CheckpointStats.ckpt_end_t,/* end_time */
+  (flags & CHECKPOINT_IS_SHUTDOWN),/* is_shutdown */
+  (flags & CHECKPOINT_END_OF_RECOVERY),/* is_end_of_recovery */
+  (flags & CHECKPOINT_IMMEDIATE),/* is_immediate */
+  (flags & CHECKPOINT_FORCE),/* is_force */
+  (flags & CHECKPOINT_WAIT),/* is_wait */
+  (flags & CHECKPOINT_CAUSE_XLOG),/* is_wal */
+  (flags & CHECKPOINT_CAUSE_TIME),/* is_time */
+  (flags & CHECKPOINT_FLUSH_ALL),/* is_flush_all */
+  CheckpointStats.ckpt_bufs_written,/* buffers_written */
+  CheckpointStats.ckpt_slru_written,/* slru_written */
+  CheckpointStats.ckpt_segs_added,/* segs_added */
+  CheckpointStats.ckpt_segs_removed,/* segs_removed */
+  CheckpointStats.ckpt_segs_recycled,/* segs_recycled */

That's a lot of arguments.  CheckpointStatsData and the various
CHECKPOINT_* flags are exposed, why not just send these values to the
hook?

For v1-0001 as well, I'd suggest some grouping with existing
structures, or expose these structures so as they can be reused for
out-of-core code via the proposed hook.  More arguments lead to more
mistakes that could be easily avoided.
--
Michael


signature.asc
Description: PGP signature


Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-12-26 Thread Michael Paquier
On Thu, Dec 26, 2024 at 06:07:45PM -0500, Bruce Momjian wrote:
> On Thu, Dec 26, 2024 at 05:08:36PM -0500, Tom Lane wrote:
>> I think removal is not happening.  The documentation for attndims
>> already says
>> 
>> Number of dimensions, if the column is an array type; otherwise
>> 0. (Presently, the number of dimensions of an array is not
>> enforced, so any nonzero value effectively means “it's an array”.)
>> 
>> and as far as I know that is completely correct.  (Hence, your gripe
>> is not that it's not "accurate", rather that it's not "precise".)
>> 
>> There isn't a comparable disclaimer under typndims, but there probably
>> should be.

FWIW, basically all application I've seen rely on attndims and typndims to
do non-zero comparison to save in joins with other catalogs.

> Using the queries in that URL, I see:
> 
>   CREATE TABLE test (data integer, data_array integer[5][5]);
>   CREATE TABLE test2 (LIKE test);
>   CREATE TABLE test3 AS SELECT * FROM test;
>   SELECT relname, attndims
>   FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = 
> pg_class.oid)
>   WHERE attname = 'data_array';
> 
>relname | attndims
>   -+--
>test|2
> -->test2   |0
> -->test3   |0

Ugh.  LIKE and CTAS are wrong here with what they add in pg_attribute.

> and run the query again I get:
> 
>   SELECT relname, attndims
>   FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = 
> pg_class.oid)
>   WHERE attname = 'data_array';
> 
>relname | attndims
>   -+--
>test|1
>test2   |1
>test3   |1
> 
> There is clearly something not ideal about this behavior.

Still, these are better post-restore.
--
Michael


signature.asc
Description: PGP signature


Fix handling of injection_points regarding pending stats

2024-12-26 Thread Michael Paquier
Hi all,

While doing more stuff related to pgstats and WAL, I have bumped on
the fact that the module injection_points uses for its
variable-numbered stats a pending flush callback but it does not
handle its entries so as these would be handled locally first, with a
final flush timed by pgstat_report_stat() that would itself call the
pending flush callback.  This is a less efficient practice, because it
requires to touch the pgstats dshash more often than necessary.

This is caused by the use of pgstat_get_entry_ref_locked() rather than
pgstat_prep_pending_entry() which would make sure to time the flush of
any pending entries appropriately depending on pgstat_report_stat().
I'd like to fix the module so as the flush timing is controlled like
all the other stats kinds, which is a better and more efficient
practice to have anyway as this code should stand as a template for
custom pgstats kinds.

While doing so, I have also noticed that injection_stats_flush_cb()
missed a pgstat_unlock_entry() when entries get locked.

All that is addressed in the patch attached.  Feel free if you have
any comments.

Thanks,
--
Michael
From d0aae1f873ec4f0cde91d851d131767e7ca66171 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 27 Dec 2024 09:16:02 +0900
Subject: [PATCH] injection_points: Tweak variable-numbered stats to work with
 pending data

As coded, the module was not using pending entries to store its data
locally before a flush to the central dshash is done with a timed
pgstat_report_stat() call.  Hence, the flush callback was defined, but
finished by being not used.  As a template, this is more efficient than
the original logic of updating directly the shared memory entries as
this reduces the interactions that need to be done with the pgstats
hash table in shared memory.

injection_stats_flush_cb() was also missing a pgstat_unlock_entry(), so
add one, while on it.
---
 .../modules/injection_points/injection_stats.c| 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index e16b9db284..21d5c10f39 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -80,6 +80,9 @@ injection_stats_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 		return false;
 
 	shfuncent->stats.numcalls += localent->numcalls;
+
+	pgstat_unlock_entry(entry_ref);
+
 	return true;
 }
 
@@ -127,13 +130,13 @@ pgstat_create_inj(const char *name)
 	if (!inj_stats_loaded || !inj_stats_enabled)
 		return;
 
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid,
-			PGSTAT_INJ_IDX(name), false);
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid,
+		  PGSTAT_INJ_IDX(name), NULL);
+
 	shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats;
 
 	/* initialize shared memory data */
 	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
-	pgstat_unlock_entry(entry_ref);
 }
 
 /*
@@ -168,16 +171,14 @@ pgstat_report_inj(const char *name)
 	if (!inj_stats_loaded || !inj_stats_enabled)
 		return;
 
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid,
-			PGSTAT_INJ_IDX(name), false);
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid,
+		  PGSTAT_INJ_IDX(name), NULL);
 
 	shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
 	/* Update the injection point statistics */
 	statent->numcalls++;
-
-	pgstat_unlock_entry(entry_ref);
 }
 
 /*
-- 
2.45.2



signature.asc
Description: PGP signature


Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-12-26 Thread Tom Lane
Bruce Momjian  writes:
> So, if users are referencing attndims and the values are not accurate,
> how useful are they?  I was suggesting removal so people would stop
> relying on inaccurate information, or we correct attndims to be
> accurate.  Allowing people to continue relying on inaccurate information
> seems like the worst of all options.

I think removal is not happening.  The documentation for attndims
already says

Number of dimensions, if the column is an array type; otherwise
0. (Presently, the number of dimensions of an array is not
enforced, so any nonzero value effectively means “it's an array”.)

and as far as I know that is completely correct.  (Hence, your gripe
is not that it's not "accurate", rather that it's not "precise".)

There isn't a comparable disclaimer under typndims, but there probably
should be.

regards, tom lane




Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-12-26 Thread Bruce Momjian
On Thu, Dec 26, 2024 at 05:08:36PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > So, if users are referencing attndims and the values are not accurate,
> > how useful are they?  I was suggesting removal so people would stop
> > relying on inaccurate information, or we correct attndims to be
> > accurate.  Allowing people to continue relying on inaccurate information
> > seems like the worst of all options.
> 
> I think removal is not happening.  The documentation for attndims
> already says
> 
> Number of dimensions, if the column is an array type; otherwise
> 0. (Presently, the number of dimensions of an array is not
> enforced, so any nonzero value effectively means “it's an array”.)
> 
> and as far as I know that is completely correct.  (Hence, your gripe
> is not that it's not "accurate", rather that it's not "precise".)
> 
> There isn't a comparable disclaimer under typndims, but there probably
> should be.

First, my apologies about the URL;  it should have been:

https://www.postgresql.org/message-id/zvwi_ozt8z9mc...@momjian.us

Using the queries in that URL, I see:

CREATE TABLE test (data integer, data_array integer[5][5]);

CREATE TABLE test2 (LIKE test);

CREATE TABLE test3 AS SELECT * FROM test;

SELECT relname, attndims
FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = 
pg_class.oid)
WHERE attname = 'data_array';

 relname | attndims
-+--
 test|2
-->  test2   |0
-->  test3   |0

Interestingly, if I dump and restore with:

$ createdb test2; pg_dump test | sql test2

and run the query again I get:

SELECT relname, attndims
FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = 
pg_class.oid)
WHERE attname = 'data_array';

 relname | attndims
-+--
 test|1
 test2   |1
 test3   |1

There is clearly something not ideal about this behavior.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.






Re: Support for unsigned integer types

2024-12-26 Thread Bruce Momjian
On Sat, Dec  7, 2024 at 11:24:37AM -0500, Tom Lane wrote:
> Jack Bay  writes:
> > Would it be possible to add support for unsigned 64-bit and unsigned
> > 32-bit integers to postgresql?
> 
> This has been discussed before, and we've concluded that the impact
> on the numeric promotion hierarchy (that is, implicit-cast rules
> among the integer types) would probably be catastrophic, leading
> to problems like ambiguous-operator errors in many cases that were
> fine before.  Quick, is "42 + 1" an int32 or uint32 operation?
> 
> That could be avoided perhaps by measures like not having any
> implicit casts between the int and uint hierarchies, but then
> there'd be a corresponding loss of usability for the uint types.
> 
> Plus, the sheer magnitude of effort needed to build out a reasonable
> set of support (functions, operators, opclasses) for uint types seems
> daunting.
> 
> On the flip side, it'd be great to be able to use uint32 instead
> of bigint for the SQL representation of types like BlockNumber.
> But we couldn't roll in such a change transparently unless we make
> int-vs-uint casting fairly transparent, which seems problematic
> as per above.
> 
> Perhaps a sufficiently determined and creative person could put
> together a patch that'd be accepted, but it'd be a lot of work
> for uncertain reward.  I'm not aware that anyone is working on
> such a thing at present.

We do have the 'oid' data type, which is an unsigned 4-byte integer, but
it lacks the casts and operator support mentioned above:

SELECT 42 + 1;
 ?column?
--
   43

SELECT 42::oid + 1;
ERROR:  operator does not exist: oid + integer
LINE 1: SELECT 42::oid + 1;
   ^
HINT:  No operator matches the given name and argument types. You might 
need to add explicit type casts.

SELECT 42::oid + 1::oid;
ERROR:  operator does not exist: oid + oid
LINE 1: SELECT 42::oid + 1::oid;
   ^
HINT:  No operator matches the given name and argument types. You might 
need to add explicit type casts.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.






Re: Logical replication timeout

2024-12-26 Thread vignesh C
On Wed, 25 Dec 2024 at 13:55, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Marc,
>
> > Thanks again for this new patch.
> >
> > Unfortunately it does not compile (17.2 source):
>
> Right, because of the reason I posted [1].
>
> I updated the patch which did the same approach. It could pass my CI.

Let's conduct some performance tests with varying numbers of spill
files (e.g., small ones like 1, 5, and 10, and larger ones like 100,
1000, and 10,000) along with different levels of concurrent
transactions. We can then compare the results with the current HEAD.

Regards,
Vignesh




Re: ERROR: corrupt MVNDistinct entry

2024-12-26 Thread Richard Guo
On Wed, Dec 25, 2024 at 6:36 PM Richard Guo  wrote:
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

Furthermore, even without extended statistics or expressional index
columns, the nullingrels can still cause issues with selectivity
estimates.  They may cause examine_variable to fail in identifying a
valid RelOptInfo for an expression, making the relation size not
available, which is required in many cases.  For instance,

create table t (a int, b int);
insert into t select i, i from generate_series(1,10)i;
analyze t;

-- in v16 and later
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
  QUERY PLAN
---
 Nested Loop Left Join  (cost=0.00..4.94 rows=1 width=24)
   Join Filter: (t1.a = t3.a)
   ->  Nested Loop Left Join  (cost=0.00..3.73 rows=1 width=16)
 Filter: (t1.a = COALESCE(t2.a))
 ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=8)
 ->  Materialize  (cost=0.00..1.15 rows=10 width=8)
   ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=8)
   ->  Seq Scan on t t3  (cost=0.00..1.12 rows=7 width=8)
 Filter: (a < 8)
(9 rows)

-- in v15 and before
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
  QUERY PLAN
---
 Hash Left Join  (cost=1.21..5.04 rows=10 width=24)
   Hash Cond: (t1.a = t3.a)
   ->  Nested Loop Left Join  (cost=0.00..3.73 rows=10 width=16)
 Filter: (t1.a = COALESCE(t2.a))
 ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=8)
 ->  Materialize  (cost=0.00..1.15 rows=10 width=8)
   ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=8)
   ->  Hash  (cost=1.12..1.12 rows=7 width=8)
 ->  Seq Scan on t t3  (cost=0.00..1.12 rows=7 width=8)
   Filter: (a < 8)
(10 rows)

In v16 and later, when calculating the join selectivity for "t1.a =
coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of
coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the
corresponding RelOptInfo is not identifiable.  This results in very
inaccurate join selectivity.

I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable().  I tried doing so and noticed a plan
diff in regression test join.sql.

@@ -2573,10 +2573,11 @@
->  Materialize
  ->  Seq Scan on int4_tbl t2
Filter: (f1 > 1)
- ->  Seq Scan on int4_tbl t3
+ ->  Materialize
+   ->  Seq Scan on int4_tbl t3
->  Materialize
  ->  Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)

Thanks
Richard




Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-26 Thread Shubham Khanna
On Fri, Dec 27, 2024 at 11:30 AM vignesh C  wrote:
> > > > >
> The documentation requires a minor update: instead of specifying
> subscriptions, the user will specify multiple databases, and the
> subscription will be created on the specified databases. Documentation
> should be updated accordingly:
> +  
> +   Enables  linkend="sql-createsubscription-params-with-two-phase">two_phase
> +   commit for the subscription. If there are multiple subscriptions
> +   specified, this option applies to all of them.
> +   The default is false.
>

I have fixed the given comment. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.


v9-0001-Add-support-for-two-phase-commit-in-pg_createsubs.patch
Description: Binary data


Re: Windows meson build

2024-12-26 Thread Vladlen Popolitov

Kohei Harikae (Fujitsu) писал(а) 2024-12-27 04:31:

Hi,

Thank you for your advice.
I added a sample setting PKG_CONFIG_PATH based on your advice.

How do you think about this?

Regards,
Kohei Harikae



Hi,
From my point of view it looks good. Actual information is added,
excessive information is removed.
I think in the phrase:
"add the directory where the .pc file resides to PKG_CONFIG_PATH."
better to add wording, that PKG_CONFIG_PATH is the list of directories 
separated
by PATH separator ( ; in Windows, : in POSIX). Otherwise the reader has 
to
search additional information about this variable in meson 
documentation.


--
Best regards,

Vladlen Popolitov.




Re: Pass ParseState as down to utility functions.

2024-12-26 Thread jian he
On Wed, Dec 25, 2024 at 5:29 PM Michael Paquier  wrote:
>
> On Mon, Dec 16, 2024 at 05:25:45PM +0800, jian he wrote:
> > i've split into 3 patches, feel free to merge them in any way.
> > v12-0001:  add error position for ATPrepAlterColumnType.
>
> For this one, why don't you do the same for undefined columns and
> USING with generated columns at least?  This looks half-baked.
>

I think I understand what you mean.
please check attached for changes within ATPrepAlterColumnType


> > v12-0002:  add error position for these 3 functions:
> > transformColumnDefinition, transformAlterTableStmt, 
> > transformTableConstraint.
>
>  ERROR:  column "c" of relation "itest4" does not exist
> +LINE 1: ALTER TABLE itest4 ALTER COLUMN c ADD GENERATED ALWAYS AS ID...
> +  ^
>
> This one is kind of confusing?  The part that matters for the error is
> the column that does not exist, not the ADD GENERATED.
>
I agree this is confusing.

while looking at 0002:
errmsg("conflicting NO INHERIT declarations for not-null constraints
on column \"%s\"", column->colname)
add parser_errposition
will not be very helpful. i think we need an errhint.

for example:
create table notnull_tbl_fail (a int primary key constraint foo not
null no inherit);
ERROR:  conflicting NO INHERIT declarations for not-null constraints
on column "a"
the error message didn't explicitly say that the primary key imply a
not-null inherit constraint.

Maybe we can change to
errmsg("conflicting NO INHERIT declarations for not-null constraints
on column \"%s\"", column->colname),
errhint("specified primary key or identity sequence imply an inherited
not-null constraint will be created")

what do you think?
From a8055a5292a57e111790629d418fe1663e1ba8ec Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 27 Dec 2024 13:36:06 +0800
Subject: [PATCH v13 1/1] add error position for ATPrepAlterColumnType
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

within ATPrepAlterColumnType, for various
ereport(ERROR...), add parser_errposition for printing out the error position.

Author: Kirill Reshke 
Author: Jian He 
Reviewed-By: Michaël Paquier 
Reviewed-By: Álvaro Herrera 

discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 25 +++
 src/test/regress/expected/alter_table.out | 14 +++
 .../regress/expected/generated_stored.out |  2 ++
 src/test/regress/expected/typed_table.out |  2 ++
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4937478262..f0acb008ea 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13222,10 +13222,12 @@ ATPrepAlterColumnType(List **wqueue,
 	AclResult	aclresult;
 	bool		is_expr;
 
+	pstate->p_sourcetext = context->queryString;
 	if (rel->rd_rel->reloftype && !recursing)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot alter column type of typed table")));
+ errmsg("cannot alter column type of typed table"),
+ parser_errposition(pstate, def->location)));
 
 	/* lookup the attribute so we can check inheritance status */
 	tuple = SearchSysCacheAttName(RelationGetRelid(rel), colName);
@@ -13233,7 +13235,8 @@ ATPrepAlterColumnType(List **wqueue,
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_COLUMN),
  errmsg("column \"%s\" of relation \"%s\" does not exist",
-		colName, RelationGetRelationName(rel;
+		colName, RelationGetRelationName(rel)),
+ parser_errposition(pstate, def->location)));
 	attTup = (Form_pg_attribute) GETSTRUCT(tuple);
 	attnum = attTup->attnum;
 
@@ -13241,8 +13244,8 @@ ATPrepAlterColumnType(List **wqueue,
 	if (attnum <= 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot alter system column \"%s\"",
-		colName)));
+ errmsg("cannot alter system column \"%s\"",colName),
+ parser_errposition(pstate, def->location)));
 
 	/*
 	 * Cannot specify USING when altering type of a generated column, because
@@ -13252,7 +13255,8 @@ ATPrepAlterColumnType(List **wqueue,
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_COLUMN_DEFINITION),
  errmsg("cannot specify USING when altering type of generated column"),
- errdetail("Column \"%s\" is a generated column.", colName)));
+ errdetail("Column \"%s\" is a generated column.", colName),
+ parser_errposition(pstate, def->location)));
 
 	/*
 	 * Don't alter inherited columns.  At outer level, there had better not be
@@ -13262,8 +13266,8 @@ ATPrepAlterColumnType(List **wqueue,
 	if (attTup->attinhcount > 0 && !recursing)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("cannot alter inherited column \"%s\"",
-		colName)));
+ errmsg("cannot alter inherited column \"%s\"", colName),
+ parser_errpositi

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread wenhui qiu
Hi Tomas

> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
agree
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
yes,   I mean by this is that the maximum value is not friendly to large
instances if the setting is small ,In the real production  instance , many
sub-tables with the same table structure are often encountered


On Fri, Dec 27, 2024 at 1:58 AM Tomas Vondra  wrote:

> On 12/26/24 17:00, wenhui qiu wrote:
> > Hi
> >
> > As far as I know, more than 10,000 tables  of instances  are often
> > encountered,
> > So I insist that the maximum can be appropriately increased to 256MB,
> > Can be more adaptable to many table situations
> >
>
> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
>
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
>
>
> regards
>
> --
> Tomas Vondra
>
>


Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-26 Thread vignesh C
On Fri, 13 Dec 2024 at 15:33, Shubham Khanna
 wrote:
>
> On Fri, Dec 13, 2024 at 2:39 PM vignesh C  wrote:
> >
> > On Fri, 13 Dec 2024 at 13:01, Shubham Khanna
> >  wrote:
> > >
> > > On Fri, Dec 13, 2024 at 12:20 PM Peter Smith  
> > > wrote:
> > > >
> > > > Hi Shubham.
> > > >
> > > > Here are my review comments for v6-0001.
> > > >
> > > > ==
> > > > 1.
> > > > +# Verify that the subtwophase is enabled ('e') in the pg_subscription 
> > > > catalog
> > > > +$node_s->poll_query_until('postgres',
> > > > + "SELECT count(1) = 2 FROM pg_subscription WHERE subtwophasestate = 
> > > > 'e';")
> > > > +  or die "Timed out while waiting for subscriber to enable twophase";
> > > > +
> > > >
> > > > This form of the SQL is probably OK but it's a bit tricky; Probably it
> > > > should have been explained in the comment about where that count "2"
> > > > has come from.
> > > >
> > > > ~~
> > > >
> > > > I think it was OK as before (v5) to be querying until nothing was NOT
> > > > 'e'. In other words, until everything was enabled 'e'.
> > > > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN 
> > > > ('e');
> > > >
> > > > ~~
> > > >
> > > > OTOH, to save execution time we really would be satisfied with both
> > > > 'p' and 'e' states here. (we don't strictly need to wait for the
> > > > transition from 'p' to 'e' to occur).
> > > >
> > > > So, SQL like the one below might be the best:
> > > >
> > > > # Verify that all subtwophase states are pending or enabled,
> > > > # e.g. there are no subscriptions where subtwophase is disabled ('d').
> > > > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate IN ('d')
> > > >
> > >
> > > I have fixed the given comment. Since prepared transactions are not
> > > being used anymore, I have removed it from the test file.
> > > The attached patch contains the suggested changes.
> >
> > Few comments:
> > 1) Since we are providing --enable-two-phase option now and user can
> > create subscriptions with two-phase option this warning can be removed
> > now:
> > 
> > -pg_createsubscriber sets up logical
> > -replication with two-phase commit disabled.  This means that any
> > -prepared transactions will be replicated at the time
> > -of COMMIT PREPARED, without advance preparation.
> > -Once setup is complete, you can manually drop and re-create the
> > -subscription(s) with
> > +If --enable-two-phase is not specified, the
> > +pg_createsubscriber sets up logical 
> > replication
> > +with two-phase commit disabled.  This means that any prepared 
> > transactions
> > +will be replicated at the time of COMMIT PREPARED,
> > +without advance preparation. Once setup is complete, you can manually 
> > drop
> > +and re-create the subscription(s) with
> >  the  > linkend="sql-createsubscription-params-with-two-phase">two_phase
> > -option enabled.
> >
> > 2) Since we are not going to wait till the subscriptions are enabled,
> > we can use safe_psql instead of poll_query_until, something like:
> > is($node_s->safe_psql('postgres',
> > "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate = 'd'"),
> > 't', 'subscriptions are created with the two-phase option enabled');
> >
> > instead of:
> > +$node_s->poll_query_until('postgres',
> > +   "SELECT count(1) = 0 FROM pg_subscription WHERE
> > subtwophasestate IN ('d');"
> > +);
> >
> > 3) This can be removed from the commit message:
> > Documentation has been updated to reflect the new option, and test cases 
> > have
> > been added to validate various scenarios, including proper validation of the
> > '--enable-two-phase' option and its combinations with other options.
> >
> > 4) Should" two-phase option" be "enable-two-phase option" here:
> > const char *sub_username;   /* subscriber username */
> > +   booltwo_phase;  /* two-phase option */
> > SimpleStringList database_names;/* list of database names */
> >
>
> I have fixed the given comments. The attached patch contains the
> suggested changes.

The documentation requires a minor update: instead of specifying
subscriptions, the user will specify multiple databases, and the
subscription will be created on the specified databases. Documentation
should be updated accordingly:
+  
+   Enables two_phase
+   commit for the subscription. If there are multiple subscriptions
+   specified, this option applies to all of them.
+   The default is false.

Regards,
Vignesh