Re: Make pg_stat_io view count IOs as bytes instead of blocks
Hi, On Tue, 24 Dec 2024 at 09:13, Michael Paquier wrote: > > On Fri, Dec 06, 2024 at 12:41:55PM +0300, Nazir Bilal Yavuz wrote: > > Thanks! I think 'track' is a better word in this context. I used > > 'tracked in ...', as it sounded more correct to me (I hope it is). > > Splitting op_bytes into three fields sounds like a good idea. Count > me in regarding the concept to depend less on BLCKSZ. > > typedef enum IOOp > { > + /* IOs not tracked in bytes */ > IOOP_EVICT, > - IOOP_EXTEND, > IOOP_FSYNC, > IOOP_HIT, > - IOOP_READ, > IOOP_REUSE, > - IOOP_WRITE, > IOOP_WRITEBACK, > + > + /* IOs tracked in bytes */ > + IOOP_EXTEND, > + IOOP_READ, > + IOOP_WRITE, > } IOOp; > > pg_stat_io_build_tuples() is now the code path taken when building the > tuples returned by pg_stat_io, meaning that the new function called > pg_stat_get_backend_io() is also going to need an update in its list > of output parameters to accomodate with what you are changing here. > Calling your new pgstat_get_io_byte_index() in the new refactored > routine is also necessary. Sorry about that. No problem at all, thank you for reminding me! > Could you send a rebase, please? I can promise to review this > thread's new patch, as that will also move the needle regarding your > work with pg_stat_io to track WAL activity. Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io() function and it seems it is working. -- Regards, Nazir Bilal Yavuz Microsoft From b5c8b86aebd1149b30bc1715921edef161f99ba7 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 26 Dec 2024 12:05:13 +0300 Subject: [PATCH v4] Make pg_stat_io count IOs as bytes instead of blocks Currently in pg_stat_io view, IOs are counted as blocks. There are two problems with this approach: 1- The actual number of I/O requests sent to the kernel is lower because I/O requests may be merged before being sent. Additionally, it gives the impression that all I/Os are done in block size, which shadows the benefits of merging I/O requests. 2- There may be some IOs which are not done in block size in the future. For example, WAL read IOs are done in variable bytes and it is not possible to correctly show these IOs in pg_stat_io view. Because of these problems, now show the total number of IO requests to the kernel (as smgr function calls) and total number of bytes in the IO. Also, op_bytes column is removed from the pg_stat_io view. --- src/include/catalog/pg_proc.dat | 12 +-- src/include/pgstat.h| 39 -- src/backend/catalog/system_views.sql| 4 +- src/backend/storage/buffer/bufmgr.c | 14 ++-- src/backend/storage/buffer/localbuf.c | 6 +- src/backend/storage/smgr/md.c | 4 +- src/backend/utils/activity/pgstat_backend.c | 2 + src/backend/utils/activity/pgstat_io.c | 16 ++-- src/backend/utils/adt/pgstatfuncs.c | 86 - src/test/regress/expected/rules.out | 6 +- doc/src/sgml/monitoring.sgml| 61 +-- 11 files changed, 172 insertions(+), 78 deletions(-) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 2dcc2d42dac..00051d41e0f 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5908,18 +5908,18 @@ proname => 'pg_stat_get_io', prorows => '30', proretset => 't', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => '', - proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}', - proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', + proallargtypes => '{text,text,text,int8,numeric,float8,int8,numeric,float8,int8,float8,int8,numeric,float8,int8,int8,int8,int8,float8,timestamptz}', + proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', + proargnames => '{backend_type,object,context,reads,read_bytes,read_time,writes,write_bytes,write_time,writebacks,writeback_time,extends,extend_bytes,extend_time,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', prosrc => 'pg_stat_get_io' }, { oid => '8806', descr => 'statistics: backend IO statistics', proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => 'int4', - proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}', - proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hit
Removing unused parameter in compute_expr_stats
Hi hackers, I'm sharing a small patch that removes an unused parameter (totalrows) from compute_expr_stats function in extended_stats.c . This function originally appeared in commit a4d75c86bf15220df22de0a92c819ecef9db3849, which introduced extended statistics on expressions. From what I can see, total rows is never used in the current implementation, so this patch removes it If there are reasons to keep this parameter, or if there's anything I might have missed, I'd be happy to discuss further. -- Best regards, Ilia Evdokimov, Tantor Labs LLC. From 0de17516bec761474761bf2862724b23ccb11beb Mon Sep 17 00:00:00 2001 From: Ilia Evdokimov Date: Thu, 26 Dec 2024 12:09:14 +0300 Subject: [PATCH v1] Remove unused totalrows parameter in compute_expr_stats The totalrows parameter in compute_expr_stats is never referenced anyway in the function or by its callers. It appears to have been introduced in commit a4d75c86bf15220df22de0a92c819ecef9db3849, but it remained unused --- src/backend/statistics/extended_stats.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 99fdf208db..85da3edb10 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -89,9 +89,8 @@ typedef struct AnlExprData VacAttrStats *vacattrstat; /* statistics attrs to analyze */ } AnlExprData; -static void compute_expr_stats(Relation onerel, double totalrows, - AnlExprData *exprdata, int nexprs, - HeapTuple *rows, int numrows); +static void compute_expr_stats(Relation onerel, AnlExprData *exprdata, + int nexprs, HeapTuple *rows, int numrows); static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs); static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); static AnlExprData *build_expr_data(List *exprs, int stattarget); @@ -220,8 +219,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows, exprdata = build_expr_data(stat->exprs, stattarget); nexprs = list_length(stat->exprs); -compute_expr_stats(onerel, totalrows, - exprdata, nexprs, +compute_expr_stats(onerel, exprdata, nexprs, rows, numrows); exprstats = serialize_expr_stats(exprdata, nexprs); @@ -2107,9 +2105,8 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp, * Compute statistics about expressions of a relation. */ static void -compute_expr_stats(Relation onerel, double totalrows, - AnlExprData *exprdata, int nexprs, - HeapTuple *rows, int numrows) +compute_expr_stats(Relation onerel, AnlExprData *exprdata, + int nexprs, HeapTuple *rows, int numrows) { MemoryContext expr_context, old_context; -- 2.34.1
Re: RFC: Allow EXPLAIN to Output Page Fault Information
On Thu, 26 Dec 2024 13:15:59 +0900 torikoshia wrote: > On 2024-12-25 00:52, Tom Lane wrote: > > torikoshia writes: > >> I have attached a PoC patch that modifies EXPLAIN to include page > >> fault > >> information during both the planning and execution phases of a > >> query. > > > > Surely these numbers would be too unstable to be worth anything. > > Thanks for your comment! > > Hmm, I didn't know these are unstable. If there are any reasons, I'd > like to know about them. > > I would like to explore alternative methods for measuring resource > usage, but > I am not aware of other approaches. > (IIUC pg_stat_kcache[1], which is said to provide information about > filesystem layer I/O usage also gets metrics from getrusage(2)) What I'd really like to see is a column added to pg_stat_database called blks_read_majflts It would be great if we could calculate a cache hit ratio that took OS major page faults into account Yes this could also be done in pg_stat_kcache but why not do it right in pg_stat_database? I think it would pretty widely appreciated and used. -Jeremy
Re: Fix crash when non-creator being an iteration on shared radix tree
On Fri, Dec 20, 2024 at 9:55 PM John Naylor wrote: > > On Sat, Dec 21, 2024 at 2:17 AM Masahiko Sawada wrote: > > > > On Fri, Dec 20, 2024 at 2:27 AM John Naylor wrote: > > > > v3-0001 allocates the iter data in the caller's context. It's a small > > > patch, but still a core behavior change so proposed for master-only. I > > > believe your v1 is still the least invasive fix for PG17. > > > > I agree to use v1 for v17. > > Okay, did you want to commit that separately, or together with my 0001 > on master? Either way, I've put a bit more effort into the commit > message in v4 and adjusted the comment slightly. I think we can commit them separately. It's a pure bug fix for back branches while for HEAD it removes the bug by an improvement. > > > > It's trivial after 0001-02: 0003 removes makes one test use slab as > > > the passed context (only 32-bit systems would actually use it > > > currently). > > > > These changes make sense to me. Here are a few comments: > > > > RT_RADIX_TREE has 'leaf_context' but it seems that we use it only for > > local memory. Do we want to declare only in the !RT_SHMEM case? > > That's already the case, if I understand your statement correctly. True. Sorry, I misunderstood something. > > > --- > > /* > > * Similar to TidStoreCreateLocal() but create a shared TidStore on a > > * DSA area. The TID storage will live in the DSA area, and the memory > > * context rt_context will have only meta data of the radix tree. > > * > > * The returned object is allocated in backend-local memory. > > */ > > > > We need to update the comment about rt_context too since we no longer > > use rt_context for shared tidstore. > > Fixed. The three patches look good to me. > BTW, it seems TidStore.context is unused? Indeed. We can remove it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Logging parallel worker draught
This is just a rebase. As stated before, I added some information to the error message for parallel queries as an experiment. It can be removed, if you re not convinced. --- Benoit Lobréau Consultant http://dalibo.comFrom 7e63f79226357f2fe84acedb950e64383e582b17 Mon Sep 17 00:00:00 2001 From: benoit Date: Tue, 8 Oct 2024 12:39:41 +0200 Subject: [PATCH 1/5] Add a guc for parallel worker logging The new guc log_parallel_workers controls whether a log message is produced to display information on the number of workers spawned when a parallel query or utility is executed. The default value is `none` which disables logging. `all` displays information for all parallel queries, whereas `failures` displays information only when the number of workers launched is lower than the number of planned workers. This new parameter can help database administrators and developers diagnose performance issues related to parallelism and optimize the configuration of the system accordingly. --- doc/src/sgml/config.sgml | 18 ++ src/backend/access/transam/parallel.c | 15 +++ src/backend/utils/misc/guc_tables.c | 12 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/parallel.h | 19 +++ 5 files changed, 65 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index fbdd6ce574..3a5f7131b3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7752,6 +7752,24 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' + + log_parallel_workers (enum) + + log_parallel_workers configuration parameter + + + + +Controls whether a log message about the number of workers is produced during the +execution of a parallel query or utility statement. The default value is +none which disables logging. all displays +information for all parallel queries, whereas failures displays +information only when the number of workers launched is lower than the number of +planned workers. + + + + log_parameter_max_length (integer) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 0a1e089ec1..e1d378efa6 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1653,3 +1653,18 @@ LookupParallelWorkerFunction(const char *libraryname, const char *funcname) return (parallel_worker_main_type) load_external_function(libraryname, funcname, true, NULL); } + +/* + * The function determines if information about workers should + * be logged. +*/ +bool +LoggingParallelWorkers(int log_parallel_workers, + int parallel_workers_to_launch, + int parallel_workers_launched) +{ + return ((log_parallel_workers == LOG_PARALLEL_WORKERS_ALL && + parallel_workers_to_launch > 0) || + (log_parallel_workers == LOG_PARALLEL_WORKERS_FAILURE && + parallel_workers_to_launch != parallel_workers_launched)); +} diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 8cf1afbad2..cc4d70177d 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -482,6 +482,7 @@ extern const struct config_enum_entry archive_mode_options[]; extern const struct config_enum_entry recovery_target_action_options[]; extern const struct config_enum_entry wal_sync_method_options[]; extern const struct config_enum_entry dynamic_shared_memory_options[]; +extern const struct config_enum_entry log_parallel_workers_options[]; /* * GUC option variables that are exported from this module @@ -526,6 +527,7 @@ int log_min_duration_statement = -1; int log_parameter_max_length = -1; int log_parameter_max_length_on_error = 0; int log_temp_files = -1; +int log_parallel_workers = LOG_PARALLEL_WORKERS_NONE; double log_statement_sample_rate = 1.0; double log_xact_sample_rate = 0; char *backtrace_functions; @@ -5226,6 +5228,16 @@ struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"log_parallel_workers", PGC_SUSET, LOGGING_WHAT, + gettext_noop("Log information about parallel worker usage"), + NULL + }, + &log_parallel_workers, + LOG_PARALLEL_WORKERS_NONE, log_parallel_workers_options, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index a2ac7575ca..f11bf173ff 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -622,6 +622,7 @@ #log_temp_files = -1 # log temporary files equal or larger # than the specified size in kilobytes; # -1 disables, 0 logs all temp files +#log_parallel_workers = none# none, all, failure #
RE: Psql meta-command conninfo+
>Hi, > >I provided a patch a month ago with a new approach as suggested by David. >Unfortunately, it didn't get any attention in the November CF. > >I would appreciate your feedback on whether we should proceed with this new >approach or stick with the previous one. > >Regards, >Hunaid Sohail --//-- Hi, The previous version was good to go and ready for a commit as soon as the final review is finalized. About David’s proposal, I’m still a little unsure, but it seems like it has a lot of potential. What do you all think? If it’s a strong direction to explore, maybe David could join us as a co-author and work with us to bring the feature. Regards, Maiquel.
Re: Memory leak in pg_logical_slot_{get,peek}_changes
On Thu, 12 Dec 2024 at 20:32, vignesh C wrote: > > On Wed, 11 Dec 2024 at 11:39, Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, December 11, 2024 12:28 PM vignesh C > > wrote: > > > Hi, > > > > > > I'm starting a new thread for one of the issues reported by Sawada-san at > > > [1]. > > > > > > This is a memory leak on CacheMemoryContext when using pgoutput via SQL > > > APIs: > > > /* Map must live as long as the session does. */ > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > > > > entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false); > > > > > > MemoryContextSwitchTo(oldctx); > > > RelationClose(ancestor); > > > > > > entry->attrmap is pfree'd only when validating the RelationSyncEntry > > > so remains even after logical decoding API calls. > > > > > > This issue can be reproduced with the following test: > > > -- Create table > > > create table t1( c1 int, c2 int, c3 int) PARTITION BY LIST (c1); > > > create table t1_1( c2 int, c1 int, c3 int); > > > ALTER TABLE t1 ATTACH PARTITION t1_1 FOR VALUES IN (0, 1, 2, 3); > > > > > > -- Create publication > > > create publication pub1 FOR TABLE t1, t1_1 WITH > > > (publish_via_partition_root = true); > > > > > > -- Create replication slot > > > SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput'); > > > > > > -- Run the below steps between Test start and Test end many times to > > > see the memory usage getting increased > > > -- Test start > > > insert into t1 values( 1); > > > SELECT count(*) FROM pg_logical_slot_get_binary_changes('test', NULL, > > > NULL, 'proto_version', '4', 'publication_names', 'pub1'); > > > > > > -- Memory increases after each invalidation and insert > > > SELECT * FROM pg_backend_memory_contexts where name = > > > 'CacheMemoryContext'; > > > -- Test end > > > > > > The attached patch resolves a memory leak by ensuring that the > > > attribute map is properly freed during plugin shutdown. This process > > > is triggered by the SQL API when the decoding context is being > > > released. Additionally, I am conducting a review to identify and > > > address any similar memory leaks that may exist elsewhere in the code. > > > > Thanks for reporting the issue and share the fix. > > > > I am not sure if freeing them in shutdown callback is safe, because shutdown > > callback Is not invoked in case of ERRORs. I think we'd better allocate them > > under cachectx in the beginning to avoid freeing them explicitly. > > > > The attachment is a POC that I internally experimented before. It replaces > > not > > only the attrmap's context, but also others which were allocated under > > CacheMemoryContext, to be consistent. Note that I have not tested it deeply. > > Feel free to use if it works for you. > > > > But the patch can only be used since PG15 where cachectx is introduced. In > > older braches, we may need to think some other ways. > > Since we cannot add a new member to PGOutputData, how about creating a > new global static memory context specifically for storing the relation > entry attribute map? This memory context could be allocated and reset > during pgoutput_startup. This way, if a SQL call is made after an > error, the context will be reset, preventing memory bloat as addressed > in the attached patch. Thoughts? Here is an updated version which includes registers to reset the memory context that is in-line with a recently committed patch at [1]. [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=cfd6cbcf9be078fbdf9587014231a5772039b386 Regards, Vignesh From abf964263e5de78d44f6235f90789cfe5143b534 Mon Sep 17 00:00:00 2001 From: Vignesh Date: Thu, 26 Dec 2024 11:37:43 +0530 Subject: [PATCH v2] Fix memory leak in pgoutput relation attribute map The pgoutput module caches relation attribute map only when validating the RelationSyncEntry. However, this was not getting freed incase of SQL functions like pg_logical_slot_{get,peek}_changes() which would bloat its cache memory usage. Every pg_logical_slot_{get,peek}_changes() call for changes on partition table creates more bloat. To address this, this relation attribute map is allocated in the plugin's cachectx which will be freed while the context is freed at the end of pg_logical_slot_{get,peek}_changes() execution. --- src/backend/replication/pgoutput/pgoutput.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 6db780d733..6c19cc1bb2 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -1197,8 +1197,8 @@ init_tuple_slot(PGOutputData *data, Relation relation, TupleDesc indesc = RelationGetDescr(relation); TupleDesc outdesc = RelationGetDescr(ancestor); - /* Map must live as long as the session does. */ - oldctx = MemoryContextSwitchTo(CacheMemoryContext); + /* Map must live as long as the logical decoding context. */
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hello, Hou! > So, I think it's not a bug in the committed patch but an issue in the testing > venvironment. Besides, since we have not seen such failures on BF, I think it > may not be necessary to improve the testcases. Thanks for your analysis! Yes, probably WSL2/Windows interactions cause strange system clock moving. It looks like it is a common issue with WSL2 [0]. [0]: https://github.com/microsoft/WSL/issues/10006 Best regards, Mikhail.
Re: attndims, typndims still not enforced, but make the value within a sane threshold
On Mon, Dec 23, 2024 at 01:56:24PM +0900, Michael Paquier wrote: > On Thu, Dec 12, 2024 at 03:40:32PM +0800, jian he wrote: > > remove pg_type typndims column patch attached. > > FWIW, I have been paying more attention to applications that may use > this attribute and bumped into quite a few cases that use quals based > on (t.typndims > 0 AND t.typbasetype > 0) to check that they're > dealing with domains over array types. So even removing this switch > would be unwise, I am afraid.. Well, I would be more excited about keeping the fields if they actually were reliable in recording information. This email from November 2023 explains the limitations of attndims: https://www.postgresql.org/message-id/flat/20150707072942.1186.98...@wrigleys.postgresql.org * dimensions not dumped by pg_dump * dimensions not propagated by CREATE TABLE ... (LIKE) * dimensions not propagated by CREATE TABLE AS * dimensions not displayed by psql So, if users are referencing attndims and the values are not accurate, how useful are they? I was suggesting removal so people would stop relying on inaccurate information, or we correct attndims to be accurate. Allowing people to continue relying on inaccurate information seems like the worst of all options. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Add XMLNamespaces to XMLElement
Hi Umar, hi Pavel, Thanks for taking a look at this patch! On 26.12.24 05:15, Umar Hayat wrote: > Hi, > +1 for the enhancement. > > I haven't compiled and reviewed the full patch yet, please see a few > comments from my side based on static analysis. > > 1. Though changes are targeted for XMLNAMESPACES for XMLElement but in > my opinion it will affect XMLTABLE as well because the > 'xml_namespace_list' rule is shared now. > Adding 'NO DEFAULT' in xml_namespace_list will allow users to use it > with XMLTABLE XMLNAMESPACES as well.PostgreSQL grammar allow to > specify DEFAULT in NAMESPACE but resulting in following error: > "ERROR: DEFAULT namespace is not supported" I also considered creating a new rule to avoid any conflict with XMLTable, but as it didn't break any regression test and the result would be pretty much the same as with "DEFAULT 'str'", I thought that extending the existing rule would be the way to go. SELECT * FROM XMLTABLE(XMLNAMESPACES(NO DEFAULT), '/rows/row' PASSING 'http://x.y";>10' COLUMNS a int PATH 'a'); ERROR: DEFAULT namespace is not supported What do you think? > What would be behavior with this change for XMLTABLE, should this be > allowed and the error messages need to be updated (may be this will > not be an error at all) or we need to restrict users to not use 'NO > DEFAULT' with XMLTable. Perhaps updating the error message would suffice? > > 2. Should we reuse the 'xml_namespaces' rule for XMLTable, as the > definition is the same. That would be good. I'm just afraid it would deviate a bit from the scope of this patch - here I mean touching other function. Would you suggest to add it to a patch series? > 3. In this patch 'NO DEFAULT' behavior is like DEFAULT '' > (empty uri) , should not it be more like 'DEFAULT NULL' to result in > the following ? > SELECT xmlelement(NAME "root", xmlnamespaces(NO DEFAULT)); > xmlelement > -- > > > instead of > > SELECT xmlelement(NAME "root", xmlnamespaces(NO DEFAULT)); > xmlelement > -- > > The idea of NO DEFAULT is pretty much to free an element (and its children) from a previous DEFAULT in the same scope. SELECT xmlserialize(DOCUMENT xmlelement(NAME "root", xmlnamespaces(DEFAULT 'http:/x.y/ns1'), xmlelement(NAME "foo", xmlnamespaces(NO DEFAULT)) ) AS text INDENT); xmlserialize -- + + (1 row) I believe this behaviour might be confusing if NO DEFAULT is used in the root element, as it there is no previously declared namespace. Perhaps making NO DEFAULT behave like DEFAULT NULL only in the root element would make things clearer? The SQL/XML spec doesn't say anything specific about it, but DB2 had the same thought[1]. For reference, here are the regress tests[2] of this patch tested with the DB2 implementation. > On Sat, 21 Dec 2024 at 14:57, Pavel Stehule wrote: >> +1 >> >> Pavel rebase in v2 attached - due to changes in gram.y Thanks a lot Best, Jim 1 - https://dbfiddle.uk/0QsWlfZR 2 - https://dbfiddle.uk/SyiDfXodFrom 93422b076b2d270ca9293cf4e0be3da62bb6af1c Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Thu, 26 Dec 2024 14:40:26 +0100 Subject: [PATCH v2] Add XMLNamespaces option to XMLElement This patch adds the scoped option XMLNamespaces to XMLElement, as described in ISO/IEC 9075-14:2023, 11.2 XML lexically scoped options: xmlnamespaces(uri AS prefix, ...) xmlnamespaces(DEFAULT uri, ...) xmlnamespaces(NO DEFAULT, ...) * prefix: Namespace's prefix. * uri:Namespace's URI. * DEFAULT prefix: Specifies the DEFAULT namespace to use within the scope of a namespace declaration. * NO DEFAULT: Specifies that NO DEFAULT namespace is to be used within the scope of a namespace declaration. == Examples == SELECT xmlelement(NAME "foo", xmlnamespaces('http:/x.y' AS bar)); xmlelement -- SELECT xmlelement(NAME "foo", xmlnamespaces(DEFAULT 'http:/x.y')); xmlelement -- SELECT xmlelement(NAME "foo", xmlnamespaces(NO DEFAULT)); xmlelement - Tests and documentation were updated accordingly. --- doc/src/sgml/func.sgml | 57 +++- src/backend/parser/gram.y | 73 +++--- src/backend/parser/parse_expr.c | 73 ++ src/backend/utils/adt/xml.c | 32 - src/include/nodes/primnodes.h | 4 +- src/include/utils/xml.h | 6 + src/test/regress/expected/xml.out | 205 src/test/regress/expected/xml_1.out | 151 src/test/regress/expected/xml_2.out | 205 src/test/regress/sql/xml.sql| 100 ++ 10 files changed, 884 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 47370e581a..b33663bc09 100644 --- a/doc/src/
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Hi As far as I know, more than 10,000 tables of instances are often encountered, So I insist that the maximum can be appropriately increased to 256MB,Can be more adaptable to many table situations On Thu, 26 Dec 2024 at 23:23, Robert Treat wrote: > On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra wrote: > > > > Hi, > > > > On 12/23/24 07:35, wenhui qiu wrote: > > > Hi Tomas > > > This is a great feature. > > > + /* > > > + * Define (or redefine) custom GUC variables. > > > + */ > > > + DefineCustomIntVariable("stats_history.size", > > > + "Sets the amount of memory available for past events.", > > > + NULL, > > > + &statsHistorySizeMB, > > > + 1, > > > + 1, > > > + 128, > > > + PGC_POSTMASTER, > > > + GUC_UNIT_MB, > > > + NULL, > > > + NULL, > > > + NULL); > > > + > > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > > > increase to store more history record ? > > > > > > > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough > > for ~500k events, which seems good enough for most systems. Of course, > > on systems with many relations might need more space, not sure. > > > > I was thinking about specifying the space in more natural terms, either > > as amount of time ("keep 1 day of history") or number of entries ("10k > > entries"). That would probably mean the memory can't be allocated as > > fixed size. > > > > Based on the above, a rough calculation is that this is enough for > holding 1 year of hourly vacuum runs for 50 tables, or a year of daily > vacuums for 1000 tables. Most folks will fall somewhere in that range > (and won't really need a year's history) but that seems like plenty > for a default. > > > But maybe it'd be possible to just write the entries to a file. We don't > > need random access to past entries (unlike e.g. pg_stat_statements), and > > people won't query that very often either. > > > > Yeah, workloads will vary, but it doesn't seem like they would more > than query workloads do. > > Robert Treat > https://xzilla.net >
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
On 12/26/24 17:00, wenhui qiu wrote: > Hi > > As far as I know, more than 10,000 tables of instances are often > encountered, > So I insist that the maximum can be appropriately increased to 256MB, > Can be more adaptable to many table situations > If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not make a fundamental difference ... Anyway, the 128MB value is rather arbitrary. I don't mind increasing the limit, or possibly removing it entirely (and accepting anything the system can handle). regards -- Tomas Vondra
Re: Add Postgres module info
On Mon, Dec 23, 2024, at 8:49 PM, Andrei Lepikhov wrote: > Looking into the control file, I see that most parameters are > unnecessary for the library. Why do we have to maintain this file? Most of the parameters apply to SQL extensions. > The 'CREATE EXTENSION' statement would have made sense if we had > register/unregister hook machinery. Without that, it seems it is just > about maintaining the library's version and comments locally in a > specific database. Well, either way you have to load the extension, either CREATE EXTENSION to load an SQL extension (and any related shared modules), or LOAD or *_preload_libraries to load a shared module. I propose to add support for shared-module-only extensions to CREATE/UPDATE/DROP EXTENSION. It would then both insert the version info in the database (from the control file, at least), and load the shares module(s). > It would be interesting to read about your real-life cases that caused > your proposal. They're in the first section of [1]. The desire to group all the files for an extension in a single directory led to a conflict with the exiting LOAD patterns, which in the final section of [1] I attempt to resolve by proposing a single way to manage *all* extensions, instead of the two separate patterns we have today. Best, David [1]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/
Re: An improvement of ProcessTwoPhaseBuffer logic
Hi Michael, > Here are two patches to address both issues: Thank you for the preparing the patches and test simplification. My bad, I overcompilcated it. I concerned about twophase file name generation. The function TwoPhaseFilePath() is pretty straitforward and unambiguous in 16 and earlier versions. The logic of this function in 17+ seems to be more complex. I do not understand it clearly. But, I guess, it will work incorrectly after turning to a newer epoch, because the epoch is calculated from TransamVariables->nextXid, but not from problem xid. The same problem may happen if we are in epoch 1 or greater. It will produce a wrong file name, because the epoch will be obtained from the last xid, not from file name xid. In another words, the function AdjustToFullTransactionId assumes that if xid > TransamVariables->nextXid, then the xid from the previous epoch. I may be not the case in our scenario. > I suspect that this can be still dangerous as-is while complicating the code > with more possible paths for the removal of the 2PC files Agree, but we may pass file name into TwoPhaseFilePath if any, instead of creation of two functions as in the patch. The cost - two new if conditions. Using file names is pretty safe. Once we read the file and extract xid from its name, just pass this file name to TwoPhaseFilePath(). If not, try to generate it. Anyway, I do not insist on it, just try to discuss. With best regards, Vitaly
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra wrote: > > Hi, > > On 12/23/24 07:35, wenhui qiu wrote: > > Hi Tomas > > This is a great feature. > > + /* > > + * Define (or redefine) custom GUC variables. > > + */ > > + DefineCustomIntVariable("stats_history.size", > > + "Sets the amount of memory available for past events.", > > + NULL, > > + &statsHistorySizeMB, > > + 1, > > + 1, > > + 128, > > + PGC_POSTMASTER, > > + GUC_UNIT_MB, > > + NULL, > > + NULL, > > + NULL); > > + > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > > increase to store more history record ? > > > > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough > for ~500k events, which seems good enough for most systems. Of course, > on systems with many relations might need more space, not sure. > > I was thinking about specifying the space in more natural terms, either > as amount of time ("keep 1 day of history") or number of entries ("10k > entries"). That would probably mean the memory can't be allocated as > fixed size. > Based on the above, a rough calculation is that this is enough for holding 1 year of hourly vacuum runs for 50 tables, or a year of daily vacuums for 1000 tables. Most folks will fall somewhere in that range (and won't really need a year's history) but that seems like plenty for a default. > But maybe it'd be possible to just write the entries to a file. We don't > need random access to past entries (unlike e.g. pg_stat_statements), and > people won't query that very often either. > Yeah, workloads will vary, but it doesn't seem like they would more than query workloads do. Robert Treat https://xzilla.net
Bug in v13 due to "Fix corruption when relation truncation fails."
Good day. Commit "Fix corruption when relation truncation fails." [0] makes smgrtruncate be called in a critical section. Unfortunately in version 13 it leads to occasional call to palloc0 inside of critical section, which triggers Assert in debug builds: - smgrtruncate calls mdtruncate - mdtruncate may call register_dirty_segment - register_dirty_segment calls RegisterSyncRequest - RegisterSyncRequest calls ForwardSyncRequest - ForwardSyncRequest may call CompactCheckpointerRequestQueue - CompactCheckpointerRequestQueue may call palloc0 ... In versions 14 and above CompactCheckpointerRequestQueue does check for critical section due to commit "Fix bugs in MultiXact truncation" [1], which were backported down to 14 version, but not 13. We caught it in our private tests, so it is real. Cherry-pick of [1] from 14 version to 13 solves the issue. [0] https://git.postgresql.org/gitweb/?p=postgresql.git;h=2280912165d [1] https://git.postgresql.org/gitweb/?p=postgresql.git;h=4c8e00ae9ae -- regards, Yura Sokolov aka funny-falcon
Re: Statistics Import and Export
On Thu, Dec 19, 2024 at 09:23:20PM -0800, Jeff Davis wrote: > On Fri, 2024-12-13 at 00:22 -0500, Corey Huinker wrote: > > Per offline conversation with Jeff, adding a --no-schema to pg_dump > > option both for completeness (we already have --no-data and --no- > > statistics), but users who previously got the effect of --no-schema > > did so by specifying --data-only, which suppresses statistics as > > well. > > > > 0001-0005 - changes to pg_dump/pg_upgrade > > Attached is a version 36j where I consolidated these patches and > cleaned up the documentation. It doesn't make a lot of sense to commit > them separately, because as soon as the pg_dump changes are there, the > pg_upgrade test starts showing a difference until it starts using the - > -no-data option. > > The biggest functional change is the way dependencies are handled for > matview stats. Materialized views ordinarily end up in > SECITON_PRE_DATA, but in some cases they can be postponed to > SECTION_POST_DATA. You solved that by always putting the matview stats > in SECTION_POST_DATA. > > I took a different approach here and, when the matview is postponed, > also postpone the matview stats. It's slightly more code, but it felt > closer to the rest of the structure, where postponing is a special case > (that we might be able to remove in the future). I am confused by this: Add options --with-statistics/--no-statistics to pg_upgrade to enable/disable transferring of statistics to the upgraded cluster. The default is --with-statistics. If statistics is the default for pg_upgrade, why would we need a --with-statistics option? Also, I see a misspelling: + printf(_(" --no-statisttics do not import statistics from old cluster\n")); -- -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Statistics Import and Export
On Wed, Dec 11, 2024 at 10:49:53PM -0500, Corey Huinker wrote: > From cf4e731db9ffaa4e89d7c5d14b32668529c8c89a Mon Sep 17 00:00:00 2001 > From: Corey Huinker > Date: Fri, 8 Nov 2024 12:27:50 -0500 > Subject: [PATCH v34 11/11] Add --force-analyze to vacuumdb. > > The vacuumdb options of --analyze-in-stages and --analyze-only are often > used after a restore from a dump or a pg_upgrade to quickly rebuild > stats on a databse. > > However, now that stats are imported in most (but not all) cases, > running either of these commands will be at least partially redundant, > and will overwrite the stats that were just imported, which is a big > POLA violation. > > We could add a new option such as --analyze-missing-in-stages, but that > wouldn't help the userbase that grown accustomed to running > --analyze-in-stages after an upgrade. > > The least-bad option to handle the situation is to change the behavior > of --analyze-only and --analyze-in-stages to only analyze tables which > were missing stats before the vacuumdb started, but offer the > --force-analyze flag to restore the old behavior for those who truly > wanted it. I am _again_ not happy with this part of the patch. Please reply to the criticism in my November 19th email: https://www.postgresql.org/message-id/zz0t1benifdnx...@momjian.us rather than ignoring it and posting the same version of the patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Removing unused parameter in compute_expr_stats
On Thu, 26 Dec 2024 at 22:41, Ilia Evdokimov wrote: > I'm sharing a small patch that removes an unused parameter (totalrows) > from compute_expr_stats function in extended_stats.c . Thanks. It's a static function, so I agree that there's no need to keep an unused parameter. Pushed. David
Re: Add Postgres module info
On 12/27/24 01:26, David Wheeler wrote: On Mon, Dec 23, 2024, at 8:49 PM, Andrei Lepikhov wrote: Looking into the control file, I see that most parameters are unnecessary for the library. Why do we have to maintain this file? Well, either way you have to load the extension, either CREATE EXTENSION to load an SQL extension (and any related shared modules), or LOAD or *_preload_libraries to load a shared module. I propose to add support for shared-module-only extensions to CREATE/UPDATE/DROP EXTENSION. It would then both insert the version info in the database (from the control file, at least), and load the shares module(s). I still can't get your point. We intentionally wrote a library, not an extension. According to user usage and upgrade patterns, it works across the whole instance and in any database or locally in a single backend and ends its impact at the end of its life. Also, it doesn't maintain any object in the database and is managed by GUCs. For example, my libraries add query tree transformations/path recommendations to the planner. It doesn't depend on a database and doesn't maintain DSM segments and users sometimes want to use it in specific backends, not databases - in a backend dedicated to analytic queries without extra overhead to backends, picked out for short queries. For what reason do I need to add complexity and call 'CREATE EXTENSION' here and add version info only in a specific database? Just because of a formal one-directory structure? -- regards, Andrei Lepikhov
RE: Windows meson build
Hi, Thank you for your advice. I added a sample setting PKG_CONFIG_PATH based on your advice. How do you think about this? Regards, Kohei Harikae -Original Message- From: Vladlen Popolitov Sent: Wednesday, December 25, 2024 7:28 PM To: Harikae, Kohei/張替 浩平 Cc: 'Andres Freund' ; 'pgsql-hackers@lists.postgresql.org' Subject: Re: Windows meson build Kohei Harikae (Fujitsu) писал(а) 2024-12-18 05:05: Hi! > Hi, > > I apologize for this late reply. > > As you advised, I was able to create the ".pc" file manually and set > the dependencies using pkgconf. > Thank you. > > However, I have read the PostgreSQL Documentation and did not realize > that pkgconf is required. > > 17.4.3.2. PostgreSQL Features > -Dgssapi > On many systems, the GSSAPI system (a part of the MIT Kerberos > installation) is not installed in a location that is searched by > default (e.g., /usr/include, /usr/lib). > In those cases, PostgreSQL will query pkg-config to detect the > required compiler and linker options. > -Dlibxml > To use a libxml2 installation that is in an unusual location, you > can set pkg-config-related environment variables (see its documentation). > > Only a few options have a pkg-config description. > It would be easier to understand pkg-config in the "17.4.3. meson > setup Options" section. > Below is my proposal for the document description. > How do you think about this? > > 17.4.3. meson setup Options > PostgreSQL will query pkg-config to detect the required compiler and > linker options. > To use module installed in an unusual location for each feature (e.g. > -Dgssapi, -Dlibxml), set pkg-config-related environment variables (see > its documentation). Meson has more straighforward solution. It uses environment variable PKG_CONFIG_PATH, that can store the list of directories with all .pc files. Meson uses this list and in this specific order to build project. As wrote in meson documentation (in the source of open source project): # Instead of relying on pkg-config or pkgconf to provide -L flags in a # specific order, we reorder library paths ourselves, according to th # order specified in PKG_CONFIG_PATH. See: # https://github.com/mesonbuild/meson/issues/4271 # It would be more clear information in PostgreSQL documentation, that all directories with pc files are listed in one variable. By the way, if you need to add , f.e. libxml library to postgreSQL build, you can 1) build and install it using , f.e. vcpkg: vcpkg install libxml2:windows-x64 --x-install-root=c:\pbuild\libxml2 2) add path to pc directory to PKG_CONFIG_PATH SET PKG_CONFIG_PATH=c:\pbuild\libxml2\winsows-x64\lib\pkgconfig;%PKG_CONFIG_PATH% 3) make steps with build meson setup c:\builddir\... --prefix=c:\postgresdir\... It is just one install and one manipulation with environment variable. No indeterminate behaviour with other variables used by specific packages and a libraries order. > > Regards, > Kohei Harikae > > -Original Message- > From: Andres Freund > Sent: Thursday, November 7, 2024 4:05 AM > To: Harikae, Kohei/張替 浩平 > Cc: 'pgsql-hackers@lists.postgresql.org' > > Subject: Re: Windows meson build > > Hi, > > On 2024-11-05 06:32:51 +, Kohei Harikae (Fujitsu) wrote: >> I do not use pkgconf in my Windows environment. >> In my Windows environment, I could not build the following OSS with >> meson. >> - 0001 icu >> - 0002 libxml >> - 0003 libxslt >> - 0004 lz4 >> - 0005 tcl >> - 0006 zlib >> - 0007 zstd >> >> [1]thread, I created a patch like the one in the attached file, and >> now I can build. >> Would you like to be able to build OSS with Windows meson without >> using pkgconf? > > You can use pkgconf or cmake for the dependencies. I don't want to > add "raw" > dependency handling for every dependency, they each build in too many > variants for that to be a sensible investment of time. > > > Greetings, > > Andres Freund -- Best regards, Vladlen Popolitov. installation-meson.patch Description: installation-meson.patch
Re: Using Expanded Objects other than Arrays from plpgsql
Michel Pelletier writes: > On Mon, Dec 23, 2024 at 8:26 AM Tom Lane wrote: >> 2. If the problem is primarily with passing the same object to >> multiple parameters of a function, couldn't you detect and optimize >> that within the function? > Ah that's a great idea, and it works beautifully! Now I can do an > efficient triangle count without even needing a function, see below > expand_matrix is only called once: Nice! > That pretty much resolves my main issues. I'm still in an exploratory > phase but I think this gets me pretty far. Is this something that has to > wait for 18 to be released? Also do you need any further testing or code > reviewing from me? Yeah, this is not the kind of change we would back-patch, so it'll have to wait for v18 (or later if people are slow to review it :-( ). I don't think there's anything more we need from you, though of course you should keep working on your code and report back if you hit anything that needs further improvement on our end. regards, tom lane
WAL-logging facility for pgstats kinds
Hi all, While brainstorming on the contents of the thread I have posted a couple of days ago, I have been looking at what could be done so as pgstats and WAL-logging could work together. This was point 2) from this message: https://www.postgresql.org/message-id/z2tblemfuofzy...@paquier.xyz While considering the point that not all stats data is worth replicating, I have fallen down to the following properties that are nice to have across the board: - A pgstats kind should be able to WAL-log some data that is should be able to decide. Including versioning of the data. - The data kind should be able to decide how this data is handled at recovery (different persistency depending on standby vs crash recovery, for one). - Having to add one RMGR for each stats kind is not going to scale, especially as we can control the redo path using a callback part of PgStat_KindInfo. For custom kinds, this enforces some validation based on if the stats library has been really loaded at startup with shared_preload_library. - It is nice for each out-of-core stats kind to not have to load and define a custom RMGR, duplicating what this central facility is doing. - The persistency of the stats data is better across crashes: this approach makes the cleanup of orphaned pgstats entries easier as this can be enforced at replay by each stats kind, if necessary, and it is also possible to force the creation of stats data. - For variable-numbered stats, the WAL-logging can be timed with the timing where any pending stats are flushed, if there is some. Table stats to prevent autovacuum from committing Seppuku for tables without stats after a crash is the first use case I can think about, where we'd want to split and expose some DML stats. Scans stats should remain untouched on standbys, for example. There may be other stats data that is worth replicating, all these could use this new facility. Attached is the result of this investigation, where I have finished by implementing a new backend facility where pgstats kinds can WAL-log data at will using a new RMGR called "PgStat". The most interesting part is pgstat_xlog_data() in a new file called pgstat_xlog.c, that stats kinds can use as an entry point to WAL-log data structures with a XLogRegisterData() managed by a new record structure called xl_pgstat_data. For clarity, the patch set has been split into several pieces, I hope rather edible: - 0001, a fix I've posted on a different thread [1], used in patch 0004 to test this new facility. - 0002, a refactoring piece to be able to expose stats kind data into the frontend (for pg_waldump). - 0003 is the core backend piece, introducing the WAL-logging routine and the new RMGR. - 0004, that provides tests for the new backend facility via a custom stats kind. Requires 0001. I am going to attach it to the next CF. Comments or opinions are welcome. [1]: https://www.postgresql.org/message-id/z23zce4w1djuk...@paquier.xyz -- Michael From dc7f5ac52407c8916a560572cfb2c0adf183dbab Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 27 Dec 2024 09:16:02 +0900 Subject: [PATCH v1 1/4] injection_points: Tweak variable-numbered stats to work with pending data As coded, the module was not using pending entries to store its data locally before a flush to the central dshash is done with a timed pgstat_report_stat() call. Hence, the flush callback was defined, but finished by being not used. As a template, this is more efficient than the original logic of updating directly the shared memory entries as this reduces the interactions that need to be done with the pgstats hash table in shared memory. injection_stats_flush_cb() was also missing a pgstat_unlock_entry(), so add one, while on it. --- .../modules/injection_points/injection_stats.c| 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c index e16b9db284..21d5c10f39 100644 --- a/src/test/modules/injection_points/injection_stats.c +++ b/src/test/modules/injection_points/injection_stats.c @@ -80,6 +80,9 @@ injection_stats_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) return false; shfuncent->stats.numcalls += localent->numcalls; + + pgstat_unlock_entry(entry_ref); + return true; } @@ -127,13 +130,13 @@ pgstat_create_inj(const char *name) if (!inj_stats_loaded || !inj_stats_enabled) return; - entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid, - PGSTAT_INJ_IDX(name), false); + entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid, + PGSTAT_INJ_IDX(name), NULL); + shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats; /* initialize shared memory data */ memset(&shstatent->stats, 0, sizeof(shstatent->stats)); - pgstat_unlock_entry(entry_ref); } /* @@ -168,16 +171,14 @@ pgstat_report_inj(const char *name) if (!inj_stats_loaded || !inj_
Re: An improvement of ProcessTwoPhaseBuffer logic
On Thu, Dec 26, 2024 at 06:11:25PM +0300, Давыдов Виталий wrote: > I concerned about twophase file name generation. The function > TwoPhaseFilePath() is pretty straitforward and unambiguous in 16 and > earlier versions. The logic of this function in 17+ seems to be more > complex. I do not understand it clearly. But, I guess, it will work > incorrectly after turning to a newer epoch, because the epoch is > calculated from TransamVariables->nextXid, but not from problem > xid. The same problem may happen if we are in epoch 1 or greater. It > will produce a wrong file name, because the epoch will be obtained > from the last xid, not from file name xid. In another words, the > function AdjustToFullTransactionId assumes that if xid > > TransamVariables->nextXid, then the xid from the previous epoch. I > may be not the case in our scenario. Yeah. At this stage, we can pretty much say that the whole idea of relying AdjustToFullTransactionId() is broken, because we would build 2PC file names based on wrong assumptions, while orphaned files could be in the far past or far future depending on the epoch. TBH, we'll live better if we remove AdjustToFullTransactionId() and sprinkle a bit more the knowledge of FullTransactionIds to build correctly the 2PC file path in v17~. I've been playing with this code for a couple of hours and finished with the attached patch. I have wondered if ReadTwoPhaseFile() should gain the same knowledge as TwoPhaseFilePath(), but decided to limit the invasiness because we always call ReadTwoPhaseFile() once we don't have any orphaned files, for all the phases of recovery. So while this requires a bit more logic that depends on FullTransactionIdFromEpochAndXid() and ReadNextFullTransactionId() to build a FullTransactionId and get the current epoch, that's still acceptable as we only store an XID in the 2PC file. So please see the attached. You will note that RemoveTwoPhaseFile(), ProcessTwoPhaseBuffer() and TwoPhaseFilePath() now require a FullTransactionId, hence callers need to think about the epoch to use. That should limit future errors compared to passing the file name as optional argument. What do you think? -- Michael From bb250c0cd303f0d8db0fee2d26353bf0e0cb1275 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 27 Dec 2024 16:34:44 +0900 Subject: [PATCH v2] Fix failure with incorrect epoch handling for 2PC files at recovery At the beginning of recovery, an orphaned two-phase file in an epoch different than the one defined in the checkpoint record could not be removed based on the assumption that AdjustToFullTransactionId() relies on, assuming that all files would be either from the current epoch for the epoch before that. If the checkpoint epoch was 0 while the 2PC file was orphaned and in the future, AdjustToFullTransactionId() would underflow the epoch used to build the 2PC file path. In non-assert builds, this would create a WARNING message referring to a 2PC file with an epoch of "" (or UINT32_MAX), as an effect of the underflow calculation. Some tests are added with dummy 2PC files in the past and the future, checking that these are properly removed. Issue introduced by 5a1dfde8334b. Reported-by: Vitaly Davydov Discussion: https://postgr.es/m/13b5b6-676c3080-4d-531db900@47931709 Backpatch-through: 17 --- src/backend/access/transam/twophase.c | 231 +- src/test/recovery/t/009_twophase.pl | 31 2 files changed, 184 insertions(+), 78 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 49be1df91c..01f5d728be 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -221,13 +221,13 @@ static void ProcessRecords(char *bufptr, TransactionId xid, static void RemoveGXact(GlobalTransaction gxact); static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len); -static char *ProcessTwoPhaseBuffer(TransactionId xid, +static char *ProcessTwoPhaseBuffer(FullTransactionId xid, XLogRecPtr prepare_start_lsn, bool fromdisk, bool setParent, bool setNextXid); static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, TimestampTz prepared_at, Oid owner, Oid databaseid); -static void RemoveTwoPhaseFile(TransactionId xid, bool giveWarning); +static void RemoveTwoPhaseFile(FullTransactionId fxid, bool giveWarning); static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len); /* @@ -926,42 +926,9 @@ TwoPhaseGetDummyProc(TransactionId xid, bool lock_held) /* State file support */ // -/* - * Compute the FullTransactionId for the given TransactionId. - * - * The wrap logic is safe here because the span of active xids cannot exceed one - * epoch at any given time. - */ -static inline FullTransactionId -AdjustToFullTransactionId(TransactionId xid) -{ -
Re: Add Postgres module info
On 12/24/24 10:42, Yurii Rashkovskii wrote: On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov I've reviewed the patch, and it is great that you support more flexible versioning now. I am just wondering a bit about the case where `minfo- >name` can be `NULL` but `minfo->version` isn't, or where both are `NULL` – should we skip any of these? Depends. I wrote code that way so as not to restrict a maintainer by initialising all the fields; remember, it may grow in the future. But I am open to changing that logic. Do you have any specific rule on which fields may be empty and that must be initialised? Do you think all fields maintainer must fill with non-zero-length constants? Also, I've added this patch to commitfest: https://commitfest.postgresql.org/51/5465/ -- regards, Andrei Lepikhov
Re: date_trunc invalid units with infinite value
On Tue, Dec 24, 2024 at 04:33:47PM +0900, Michael Paquier wrote: > I am planning to get this one applied around the end of this week on > Friday for HEAD, that should be enough if there are comments and/or > objections. And done for now. If there are any remarks and/or objections, of course feel free. -- Michael signature.asc Description: PGP signature
Re: Add Postgres module info
On Fri, Dec 27, 2024 at 8:34 AM Andrei Lepikhov wrote: > On 12/24/24 10:42, Yurii Rashkovskii wrote: > > On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov > I've reviewed the patch, and it is great that you support more flexible > > versioning now. I am just wondering a bit about the case where `minfo- > > >name` can be `NULL` but `minfo->version` isn't, or where both are > > `NULL` – should we skip any of these? > Depends. I wrote code that way so as not to restrict a maintainer by > initialising all the fields; remember, it may grow in the future. > But I am open to changing that logic. Do you have any specific rule on > which fields may be empty and that must be initialised? Do you think all > fields maintainer must fill with non-zero-length constants? After more thinking, I'll concede that not doing anything about null metadata is probably better – making the function always return the list of modules, regardless of whether any metadata was supplied. It's beneficial to be able to get the entire list of modules regardless of metadata. The only other minor concern I have left is that some modules might have a clashing name or may change the name during the extension's lifetime (happened to some of my early work). Providing a permanent identifier and a human-digestible identifier may be worth it.
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, 24 Dec 2024 at 17:07, Nisha Moond wrote: > > On Fri, Dec 20, 2024 at 3:12 PM Amit Kapila wrote: >> >> On Mon, Dec 16, 2024 at 4:10 PM Nisha Moond wrote: >> > >> > Here is the v56 patch set with the above comments incorporated. >> > >> >> Review comments: >> === >> 1. >> + { >> + {"idle_replication_slot_timeout", PGC_SIGHUP, REPLICATION_SENDING, >> + gettext_noop("Sets the duration a replication slot can remain idle before " >> + "it is invalidated."), >> + NULL, >> + GUC_UNIT_MS >> + }, >> + &idle_replication_slot_timeout_ms, >> >> I think users are going to keep idele_slot timeout at least in hours. >> So, millisecond seems the wrong choice to me. I suggest to keep the >> units in minutes. I understand that writing a test would be >> challenging as spending a minute or more on one test is not advisable. >> But I don't see any test testing the other GUCs that are in minutes >> (wal_summary_keep_time and log_rotation_age). The default value should >> be one day. > > > +1 > - Changed the GUC unit to "minute". > > Regarding the tests, we have two potential options: > 1) Introduce an additional "debug_xx" GUC parameter with units of seconds or > milliseconds, only for testing purposes. > 2) Skip writing tests for this, similar to other GUCs with units in minutes. > > IMO, adding an additional GUC just for testing may not be worthwhile. It's > reasonable to proceed without the test. > > Thoughts? > > The attached v57 patch-set addresses all the comments. I have kept the test > case in the patch for now, it takes 2-3 minutes to complete. Few comments: 1) We have disabled the similar configuration max_slot_wal_keep_size by setting to -1, as this GUC also is in similar lines, should we disable this and let the user configure it? +/* + * Invalidate replication slots that have remained idle longer than this + * duration; '0' disables it. + */ +intidle_replication_slot_timeout_min = HOURS_PER_DAY * MINS_PER_HOUR; + 2) I felt this behavior is an existing behavior, so this can also be moved to 0001 patch: diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index a586156614..199d7248ee 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -2566,7 +2566,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx The time when the slot became inactive. NULL if the -slot is currently being streamed. +slot is currently being streamed. If the slot becomes invalidated, +this value will remain unchanged until server shutdown. 3) Can we change the comment below to "We don't allow the value of idle_replication_slot_timeout other than 0 during the binary upgrade. See start_postmaster() in pg_upgrade for more details.": + * The idle_replication_slot_timeout must be disabled (set to 0) + * during the binary upgrade. + */ +bool +check_idle_replication_slot_timeout(int *newval, void **extra, GucSource source) Regards, Vignesh
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote: > If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not > make a fundamental difference ... > > Anyway, the 128MB value is rather arbitrary. I don't mind increasing the > limit, or possibly removing it entirely (and accepting anything the > system can handle). +DefineCustomIntVariable("stats_history.size", +"Sets the amount of memory available for past events.", How about some time-based retention? Data size can be hard to think about for the end user, while there would be a set of users that would want to retain data for the past week, month, etc? If both size and time upper-bound are define, then entries that match one condition or the other are removed. +checkpoint_log_hook( + CheckpointStats.ckpt_start_t,/* start_time */ + CheckpointStats.ckpt_end_t,/* end_time */ + (flags & CHECKPOINT_IS_SHUTDOWN),/* is_shutdown */ + (flags & CHECKPOINT_END_OF_RECOVERY),/* is_end_of_recovery */ + (flags & CHECKPOINT_IMMEDIATE),/* is_immediate */ + (flags & CHECKPOINT_FORCE),/* is_force */ + (flags & CHECKPOINT_WAIT),/* is_wait */ + (flags & CHECKPOINT_CAUSE_XLOG),/* is_wal */ + (flags & CHECKPOINT_CAUSE_TIME),/* is_time */ + (flags & CHECKPOINT_FLUSH_ALL),/* is_flush_all */ + CheckpointStats.ckpt_bufs_written,/* buffers_written */ + CheckpointStats.ckpt_slru_written,/* slru_written */ + CheckpointStats.ckpt_segs_added,/* segs_added */ + CheckpointStats.ckpt_segs_removed,/* segs_removed */ + CheckpointStats.ckpt_segs_recycled,/* segs_recycled */ That's a lot of arguments. CheckpointStatsData and the various CHECKPOINT_* flags are exposed, why not just send these values to the hook? For v1-0001 as well, I'd suggest some grouping with existing structures, or expose these structures so as they can be reused for out-of-core code via the proposed hook. More arguments lead to more mistakes that could be easily avoided. -- Michael signature.asc Description: PGP signature
Re: attndims, typndims still not enforced, but make the value within a sane threshold
On Thu, Dec 26, 2024 at 06:07:45PM -0500, Bruce Momjian wrote: > On Thu, Dec 26, 2024 at 05:08:36PM -0500, Tom Lane wrote: >> I think removal is not happening. The documentation for attndims >> already says >> >> Number of dimensions, if the column is an array type; otherwise >> 0. (Presently, the number of dimensions of an array is not >> enforced, so any nonzero value effectively means “it's an array”.) >> >> and as far as I know that is completely correct. (Hence, your gripe >> is not that it's not "accurate", rather that it's not "precise".) >> >> There isn't a comparable disclaimer under typndims, but there probably >> should be. FWIW, basically all application I've seen rely on attndims and typndims to do non-zero comparison to save in joins with other catalogs. > Using the queries in that URL, I see: > > CREATE TABLE test (data integer, data_array integer[5][5]); > CREATE TABLE test2 (LIKE test); > CREATE TABLE test3 AS SELECT * FROM test; > SELECT relname, attndims > FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = > pg_class.oid) > WHERE attname = 'data_array'; > >relname | attndims > -+-- >test|2 > -->test2 |0 > -->test3 |0 Ugh. LIKE and CTAS are wrong here with what they add in pg_attribute. > and run the query again I get: > > SELECT relname, attndims > FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = > pg_class.oid) > WHERE attname = 'data_array'; > >relname | attndims > -+-- >test|1 >test2 |1 >test3 |1 > > There is clearly something not ideal about this behavior. Still, these are better post-restore. -- Michael signature.asc Description: PGP signature
Fix handling of injection_points regarding pending stats
Hi all, While doing more stuff related to pgstats and WAL, I have bumped on the fact that the module injection_points uses for its variable-numbered stats a pending flush callback but it does not handle its entries so as these would be handled locally first, with a final flush timed by pgstat_report_stat() that would itself call the pending flush callback. This is a less efficient practice, because it requires to touch the pgstats dshash more often than necessary. This is caused by the use of pgstat_get_entry_ref_locked() rather than pgstat_prep_pending_entry() which would make sure to time the flush of any pending entries appropriately depending on pgstat_report_stat(). I'd like to fix the module so as the flush timing is controlled like all the other stats kinds, which is a better and more efficient practice to have anyway as this code should stand as a template for custom pgstats kinds. While doing so, I have also noticed that injection_stats_flush_cb() missed a pgstat_unlock_entry() when entries get locked. All that is addressed in the patch attached. Feel free if you have any comments. Thanks, -- Michael From d0aae1f873ec4f0cde91d851d131767e7ca66171 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 27 Dec 2024 09:16:02 +0900 Subject: [PATCH] injection_points: Tweak variable-numbered stats to work with pending data As coded, the module was not using pending entries to store its data locally before a flush to the central dshash is done with a timed pgstat_report_stat() call. Hence, the flush callback was defined, but finished by being not used. As a template, this is more efficient than the original logic of updating directly the shared memory entries as this reduces the interactions that need to be done with the pgstats hash table in shared memory. injection_stats_flush_cb() was also missing a pgstat_unlock_entry(), so add one, while on it. --- .../modules/injection_points/injection_stats.c| 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c index e16b9db284..21d5c10f39 100644 --- a/src/test/modules/injection_points/injection_stats.c +++ b/src/test/modules/injection_points/injection_stats.c @@ -80,6 +80,9 @@ injection_stats_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) return false; shfuncent->stats.numcalls += localent->numcalls; + + pgstat_unlock_entry(entry_ref); + return true; } @@ -127,13 +130,13 @@ pgstat_create_inj(const char *name) if (!inj_stats_loaded || !inj_stats_enabled) return; - entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid, - PGSTAT_INJ_IDX(name), false); + entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid, + PGSTAT_INJ_IDX(name), NULL); + shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats; /* initialize shared memory data */ memset(&shstatent->stats, 0, sizeof(shstatent->stats)); - pgstat_unlock_entry(entry_ref); } /* @@ -168,16 +171,14 @@ pgstat_report_inj(const char *name) if (!inj_stats_loaded || !inj_stats_enabled) return; - entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_INJECTION, InvalidOid, - PGSTAT_INJ_IDX(name), false); + entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_INJECTION, InvalidOid, + PGSTAT_INJ_IDX(name), NULL); shstatent = (PgStatShared_InjectionPoint *) entry_ref->shared_stats; statent = &shstatent->stats; /* Update the injection point statistics */ statent->numcalls++; - - pgstat_unlock_entry(entry_ref); } /* -- 2.45.2 signature.asc Description: PGP signature
Re: attndims, typndims still not enforced, but make the value within a sane threshold
Bruce Momjian writes: > So, if users are referencing attndims and the values are not accurate, > how useful are they? I was suggesting removal so people would stop > relying on inaccurate information, or we correct attndims to be > accurate. Allowing people to continue relying on inaccurate information > seems like the worst of all options. I think removal is not happening. The documentation for attndims already says Number of dimensions, if the column is an array type; otherwise 0. (Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means “it's an array”.) and as far as I know that is completely correct. (Hence, your gripe is not that it's not "accurate", rather that it's not "precise".) There isn't a comparable disclaimer under typndims, but there probably should be. regards, tom lane
Re: attndims, typndims still not enforced, but make the value within a sane threshold
On Thu, Dec 26, 2024 at 05:08:36PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > So, if users are referencing attndims and the values are not accurate, > > how useful are they? I was suggesting removal so people would stop > > relying on inaccurate information, or we correct attndims to be > > accurate. Allowing people to continue relying on inaccurate information > > seems like the worst of all options. > > I think removal is not happening. The documentation for attndims > already says > > Number of dimensions, if the column is an array type; otherwise > 0. (Presently, the number of dimensions of an array is not > enforced, so any nonzero value effectively means “it's an array”.) > > and as far as I know that is completely correct. (Hence, your gripe > is not that it's not "accurate", rather that it's not "precise".) > > There isn't a comparable disclaimer under typndims, but there probably > should be. First, my apologies about the URL; it should have been: https://www.postgresql.org/message-id/zvwi_ozt8z9mc...@momjian.us Using the queries in that URL, I see: CREATE TABLE test (data integer, data_array integer[5][5]); CREATE TABLE test2 (LIKE test); CREATE TABLE test3 AS SELECT * FROM test; SELECT relname, attndims FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = pg_class.oid) WHERE attname = 'data_array'; relname | attndims -+-- test|2 --> test2 |0 --> test3 |0 Interestingly, if I dump and restore with: $ createdb test2; pg_dump test | sql test2 and run the query again I get: SELECT relname, attndims FROM pg_class JOIN pg_attribute ON (pg_attribute.attrelid = pg_class.oid) WHERE attname = 'data_array'; relname | attndims -+-- test|1 test2 |1 test3 |1 There is clearly something not ideal about this behavior. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Support for unsigned integer types
On Sat, Dec 7, 2024 at 11:24:37AM -0500, Tom Lane wrote: > Jack Bay writes: > > Would it be possible to add support for unsigned 64-bit and unsigned > > 32-bit integers to postgresql? > > This has been discussed before, and we've concluded that the impact > on the numeric promotion hierarchy (that is, implicit-cast rules > among the integer types) would probably be catastrophic, leading > to problems like ambiguous-operator errors in many cases that were > fine before. Quick, is "42 + 1" an int32 or uint32 operation? > > That could be avoided perhaps by measures like not having any > implicit casts between the int and uint hierarchies, but then > there'd be a corresponding loss of usability for the uint types. > > Plus, the sheer magnitude of effort needed to build out a reasonable > set of support (functions, operators, opclasses) for uint types seems > daunting. > > On the flip side, it'd be great to be able to use uint32 instead > of bigint for the SQL representation of types like BlockNumber. > But we couldn't roll in such a change transparently unless we make > int-vs-uint casting fairly transparent, which seems problematic > as per above. > > Perhaps a sufficiently determined and creative person could put > together a patch that'd be accepted, but it'd be a lot of work > for uncertain reward. I'm not aware that anyone is working on > such a thing at present. We do have the 'oid' data type, which is an unsigned 4-byte integer, but it lacks the casts and operator support mentioned above: SELECT 42 + 1; ?column? -- 43 SELECT 42::oid + 1; ERROR: operator does not exist: oid + integer LINE 1: SELECT 42::oid + 1; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. SELECT 42::oid + 1::oid; ERROR: operator does not exist: oid + oid LINE 1: SELECT 42::oid + 1::oid; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Logical replication timeout
On Wed, 25 Dec 2024 at 13:55, Hayato Kuroda (Fujitsu) wrote: > > Dear Marc, > > > Thanks again for this new patch. > > > > Unfortunately it does not compile (17.2 source): > > Right, because of the reason I posted [1]. > > I updated the patch which did the same approach. It could pass my CI. Let's conduct some performance tests with varying numbers of spill files (e.g., small ones like 1, 5, and 10, and larger ones like 100, 1000, and 10,000) along with different levels of concurrent transactions. We can then compare the results with the current HEAD. Regards, Vignesh
Re: ERROR: corrupt MVNDistinct entry
On Wed, Dec 25, 2024 at 6:36 PM Richard Guo wrote: > In v16 and later, the nullingrels within the expression "t2.a + t2.b" > prevent it from being matched to the corresponding expression in > extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005). Furthermore, even without extended statistics or expressional index columns, the nullingrels can still cause issues with selectivity estimates. They may cause examine_variable to fail in identifying a valid RelOptInfo for an expression, making the relation size not available, which is required in many cases. For instance, create table t (a int, b int); insert into t select i, i from generate_series(1,10)i; analyze t; -- in v16 and later explain (costs on) select * from t t1 left join t t2 on true left join t t3 on t1.a = t3.a and t3.a < 8 where t1.a = coalesce(t2.a); QUERY PLAN --- Nested Loop Left Join (cost=0.00..4.94 rows=1 width=24) Join Filter: (t1.a = t3.a) -> Nested Loop Left Join (cost=0.00..3.73 rows=1 width=16) Filter: (t1.a = COALESCE(t2.a)) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8) -> Materialize (cost=0.00..1.15 rows=10 width=8) -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8) -> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8) Filter: (a < 8) (9 rows) -- in v15 and before explain (costs on) select * from t t1 left join t t2 on true left join t t3 on t1.a = t3.a and t3.a < 8 where t1.a = coalesce(t2.a); QUERY PLAN --- Hash Left Join (cost=1.21..5.04 rows=10 width=24) Hash Cond: (t1.a = t3.a) -> Nested Loop Left Join (cost=0.00..3.73 rows=10 width=16) Filter: (t1.a = COALESCE(t2.a)) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8) -> Materialize (cost=0.00..1.15 rows=10 width=8) -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8) -> Hash (cost=1.12..1.12 rows=7 width=8) -> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8) Filter: (a < 8) (10 rows) In v16 and later, when calculating the join selectivity for "t1.a = coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the corresponding RelOptInfo is not identifiable. This results in very inaccurate join selectivity. I'm wondering if we also need to strip out the nullingrels from the expression in examine_variable(). I tried doing so and noticed a plan diff in regression test join.sql. @@ -2573,10 +2573,11 @@ -> Materialize -> Seq Scan on int4_tbl t2 Filter: (f1 > 1) - -> Seq Scan on int4_tbl t3 + -> Materialize + -> Seq Scan on int4_tbl t3 -> Materialize -> Seq Scan on int4_tbl t4 -(13 rows) +(14 rows) Thanks Richard
Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.
On Fri, Dec 27, 2024 at 11:30 AM vignesh C wrote: > > > > > > The documentation requires a minor update: instead of specifying > subscriptions, the user will specify multiple databases, and the > subscription will be created on the specified databases. Documentation > should be updated accordingly: > + > + Enables linkend="sql-createsubscription-params-with-two-phase">two_phase > + commit for the subscription. If there are multiple subscriptions > + specified, this option applies to all of them. > + The default is false. > I have fixed the given comment. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna. v9-0001-Add-support-for-two-phase-commit-in-pg_createsubs.patch Description: Binary data
Re: Windows meson build
Kohei Harikae (Fujitsu) писал(а) 2024-12-27 04:31: Hi, Thank you for your advice. I added a sample setting PKG_CONFIG_PATH based on your advice. How do you think about this? Regards, Kohei Harikae Hi, From my point of view it looks good. Actual information is added, excessive information is removed. I think in the phrase: "add the directory where the .pc file resides to PKG_CONFIG_PATH." better to add wording, that PKG_CONFIG_PATH is the list of directories separated by PATH separator ( ; in Windows, : in POSIX). Otherwise the reader has to search additional information about this variable in meson documentation. -- Best regards, Vladlen Popolitov.
Re: Pass ParseState as down to utility functions.
On Wed, Dec 25, 2024 at 5:29 PM Michael Paquier wrote: > > On Mon, Dec 16, 2024 at 05:25:45PM +0800, jian he wrote: > > i've split into 3 patches, feel free to merge them in any way. > > v12-0001: add error position for ATPrepAlterColumnType. > > For this one, why don't you do the same for undefined columns and > USING with generated columns at least? This looks half-baked. > I think I understand what you mean. please check attached for changes within ATPrepAlterColumnType > > v12-0002: add error position for these 3 functions: > > transformColumnDefinition, transformAlterTableStmt, > > transformTableConstraint. > > ERROR: column "c" of relation "itest4" does not exist > +LINE 1: ALTER TABLE itest4 ALTER COLUMN c ADD GENERATED ALWAYS AS ID... > + ^ > > This one is kind of confusing? The part that matters for the error is > the column that does not exist, not the ADD GENERATED. > I agree this is confusing. while looking at 0002: errmsg("conflicting NO INHERIT declarations for not-null constraints on column \"%s\"", column->colname) add parser_errposition will not be very helpful. i think we need an errhint. for example: create table notnull_tbl_fail (a int primary key constraint foo not null no inherit); ERROR: conflicting NO INHERIT declarations for not-null constraints on column "a" the error message didn't explicitly say that the primary key imply a not-null inherit constraint. Maybe we can change to errmsg("conflicting NO INHERIT declarations for not-null constraints on column \"%s\"", column->colname), errhint("specified primary key or identity sequence imply an inherited not-null constraint will be created") what do you think? From a8055a5292a57e111790629d418fe1663e1ba8ec Mon Sep 17 00:00:00 2001 From: jian he Date: Fri, 27 Dec 2024 13:36:06 +0800 Subject: [PATCH v13 1/1] add error position for ATPrepAlterColumnType MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit within ATPrepAlterColumnType, for various ereport(ERROR...), add parser_errposition for printing out the error position. Author: Kirill Reshke Author: Jian He Reviewed-By: Michaël Paquier Reviewed-By: Álvaro Herrera discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com --- src/backend/commands/tablecmds.c | 25 +++ src/test/regress/expected/alter_table.out | 14 +++ .../regress/expected/generated_stored.out | 2 ++ src/test/regress/expected/typed_table.out | 2 ++ 4 files changed, 33 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 4937478262..f0acb008ea 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13222,10 +13222,12 @@ ATPrepAlterColumnType(List **wqueue, AclResult aclresult; bool is_expr; + pstate->p_sourcetext = context->queryString; if (rel->rd_rel->reloftype && !recursing) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot alter column type of typed table"))); + errmsg("cannot alter column type of typed table"), + parser_errposition(pstate, def->location))); /* lookup the attribute so we can check inheritance status */ tuple = SearchSysCacheAttName(RelationGetRelid(rel), colName); @@ -13233,7 +13235,8 @@ ATPrepAlterColumnType(List **wqueue, ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg("column \"%s\" of relation \"%s\" does not exist", - colName, RelationGetRelationName(rel; + colName, RelationGetRelationName(rel)), + parser_errposition(pstate, def->location))); attTup = (Form_pg_attribute) GETSTRUCT(tuple); attnum = attTup->attnum; @@ -13241,8 +13244,8 @@ ATPrepAlterColumnType(List **wqueue, if (attnum <= 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot alter system column \"%s\"", - colName))); + errmsg("cannot alter system column \"%s\"",colName), + parser_errposition(pstate, def->location))); /* * Cannot specify USING when altering type of a generated column, because @@ -13252,7 +13255,8 @@ ATPrepAlterColumnType(List **wqueue, ereport(ERROR, (errcode(ERRCODE_INVALID_COLUMN_DEFINITION), errmsg("cannot specify USING when altering type of generated column"), - errdetail("Column \"%s\" is a generated column.", colName))); + errdetail("Column \"%s\" is a generated column.", colName), + parser_errposition(pstate, def->location))); /* * Don't alter inherited columns. At outer level, there had better not be @@ -13262,8 +13266,8 @@ ATPrepAlterColumnType(List **wqueue, if (attTup->attinhcount > 0 && !recursing) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), - errmsg("cannot alter inherited column \"%s\"", - colName))); + errmsg("cannot alter inherited column \"%s\"", colName), + parser_errpositi
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Hi Tomas > If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not > make a fundamental difference ... agree > Anyway, the 128MB value is rather arbitrary. I don't mind increasing the > limit, or possibly removing it entirely (and accepting anything the > system can handle). yes, I mean by this is that the maximum value is not friendly to large instances if the setting is small ,In the real production instance , many sub-tables with the same table structure are often encountered On Fri, Dec 27, 2024 at 1:58 AM Tomas Vondra wrote: > On 12/26/24 17:00, wenhui qiu wrote: > > Hi > > > > As far as I know, more than 10,000 tables of instances are often > > encountered, > > So I insist that the maximum can be appropriately increased to 256MB, > > Can be more adaptable to many table situations > > > > If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not > make a fundamental difference ... > > Anyway, the 128MB value is rather arbitrary. I don't mind increasing the > limit, or possibly removing it entirely (and accepting anything the > system can handle). > > > regards > > -- > Tomas Vondra > >
Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.
On Fri, 13 Dec 2024 at 15:33, Shubham Khanna wrote: > > On Fri, Dec 13, 2024 at 2:39 PM vignesh C wrote: > > > > On Fri, 13 Dec 2024 at 13:01, Shubham Khanna > > wrote: > > > > > > On Fri, Dec 13, 2024 at 12:20 PM Peter Smith > > > wrote: > > > > > > > > Hi Shubham. > > > > > > > > Here are my review comments for v6-0001. > > > > > > > > == > > > > 1. > > > > +# Verify that the subtwophase is enabled ('e') in the pg_subscription > > > > catalog > > > > +$node_s->poll_query_until('postgres', > > > > + "SELECT count(1) = 2 FROM pg_subscription WHERE subtwophasestate = > > > > 'e';") > > > > + or die "Timed out while waiting for subscriber to enable twophase"; > > > > + > > > > > > > > This form of the SQL is probably OK but it's a bit tricky; Probably it > > > > should have been explained in the comment about where that count "2" > > > > has come from. > > > > > > > > ~~ > > > > > > > > I think it was OK as before (v5) to be querying until nothing was NOT > > > > 'e'. In other words, until everything was enabled 'e'. > > > > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN > > > > ('e'); > > > > > > > > ~~ > > > > > > > > OTOH, to save execution time we really would be satisfied with both > > > > 'p' and 'e' states here. (we don't strictly need to wait for the > > > > transition from 'p' to 'e' to occur). > > > > > > > > So, SQL like the one below might be the best: > > > > > > > > # Verify that all subtwophase states are pending or enabled, > > > > # e.g. there are no subscriptions where subtwophase is disabled ('d'). > > > > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate IN ('d') > > > > > > > > > > I have fixed the given comment. Since prepared transactions are not > > > being used anymore, I have removed it from the test file. > > > The attached patch contains the suggested changes. > > > > Few comments: > > 1) Since we are providing --enable-two-phase option now and user can > > create subscriptions with two-phase option this warning can be removed > > now: > > > > -pg_createsubscriber sets up logical > > -replication with two-phase commit disabled. This means that any > > -prepared transactions will be replicated at the time > > -of COMMIT PREPARED, without advance preparation. > > -Once setup is complete, you can manually drop and re-create the > > -subscription(s) with > > +If --enable-two-phase is not specified, the > > +pg_createsubscriber sets up logical > > replication > > +with two-phase commit disabled. This means that any prepared > > transactions > > +will be replicated at the time of COMMIT PREPARED, > > +without advance preparation. Once setup is complete, you can manually > > drop > > +and re-create the subscription(s) with > > the > linkend="sql-createsubscription-params-with-two-phase">two_phase > > -option enabled. > > > > 2) Since we are not going to wait till the subscriptions are enabled, > > we can use safe_psql instead of poll_query_until, something like: > > is($node_s->safe_psql('postgres', > > "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate = 'd'"), > > 't', 'subscriptions are created with the two-phase option enabled'); > > > > instead of: > > +$node_s->poll_query_until('postgres', > > + "SELECT count(1) = 0 FROM pg_subscription WHERE > > subtwophasestate IN ('d');" > > +); > > > > 3) This can be removed from the commit message: > > Documentation has been updated to reflect the new option, and test cases > > have > > been added to validate various scenarios, including proper validation of the > > '--enable-two-phase' option and its combinations with other options. > > > > 4) Should" two-phase option" be "enable-two-phase option" here: > > const char *sub_username; /* subscriber username */ > > + booltwo_phase; /* two-phase option */ > > SimpleStringList database_names;/* list of database names */ > > > > I have fixed the given comments. The attached patch contains the > suggested changes. The documentation requires a minor update: instead of specifying subscriptions, the user will specify multiple databases, and the subscription will be created on the specified databases. Documentation should be updated accordingly: + + Enables two_phase + commit for the subscription. If there are multiple subscriptions + specified, this option applies to all of them. + The default is false. Regards, Vignesh