Re: Pulling up direct-correlated ANY_SUBLINK
Hi All: On Tue, Sep 10, 2019 at 9:49 PM Tom Lane wrote: > Richard Guo writes: > > Currently we do not try to pull up sub-select of type ANY_SUBLINK if it > > refers to any Vars of the parent query, as indicated in the code snippet > > below: > > if (contain_vars_of_level((Node *) subselect, 1)) > > return NULL; > > Why do we have this check? > > Because the result would not be a join between two independent tables. > I think this situation is caused by we pull-up the ANY-sublink with 2 steps, the first step is to pull up the sublink as a subquery, and the next step is to pull up the subquery if it is allowed. The benefits of this method are obvious, pulling up the subquery has more requirements, even if we can just finish the first step, we still get huge benefits. However the bad stuff happens if varlevelsup = 1 involves, step 1 fails! The solution here is to use the lateral join to overcome the two independent tables, the issue of this solution includes: 1. LATERAL pretty much constrains things to use a nestloop like below, but this reason is questioned since if we can pull-up the subquery, if so the constraint gone. [1] 2. It has something with unique-ify the inner path. [2] , but Richard thought it should be fixed but without an agreement for all people [3]. 3. Richard [4] found it would fail to get a plan for some query. (the error is below per my testing) > ERROR: failed to build any 3-way joins So back to the root cause of this issue, IIUC, if varlevelsup = 1 involves, can we just bypass the 2-steps method, just as what we do for EXISTS sublinks? If so, we just need to convert the ANY-SUBLINK to EXIST-SUBLINK under the case. The attached is the one commit which includes the 2 methods discussed here, controlled by different GUC separately, for easy testing. Per my test, Query 2 choosed the Unique Join with the IN-to-EXISTS method, but not with the Lateral method, and query 3 raises error with the lateral method, but not with the IN-to-EXISTS method. [1] https://www.postgresql.org/message-id/flat/60794.1568104308%40antos#365d5ec69fd605a8569a2674a33909a1 [2] https://www.postgresql.org/message-id/60794.1568104308%40antos [3] https://www.postgresql.org/message-id/CAN_9JTzqa-3RmHAw3wZv099Rk8xX480YdEvGy%2BJAdVw8dTnHRA%40mail.gmail.com [4] https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS%3DD5TY%3D4KGHEQcfHPZCXS1GVhkA%40mail.gmail.com > > Can we try to pull up direct-correlated ANY SubLink with the help of > > LATERAL? > > Perhaps. But what's the argument that you'd end up with a better > plan? LATERAL pretty much constrains things to use a nestloop, > so I'm not sure there's anything fundamentally different. > > regards, tom lane > > -- Best Regards Andy Fan test.sql Description: application/sql From 1bdc3b9098851ab1f6897f497daf90c601eca27e Mon Sep 17 00:00:00 2001 From: Andy Fan Date: Sun, 9 Oct 2022 17:47:23 +0800 Subject: [PATCH v1] 2 methods for Pulling up direct-correlated ANY_SUBLINK --- .../postgres_fdw/expected/postgres_fdw.out| 24 +- src/backend/optimizer/plan/subselect.c| 7 +- src/backend/optimizer/prep/prepjointree.c | 291 ++ src/backend/utils/misc/guc_tables.c | 23 ++ src/test/regress/expected/join.out| 35 ++- src/test/regress/expected/subselect.out | 121 src/test/regress/sql/subselect.sql| 61 7 files changed, 531 insertions(+), 31 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index b3c8ce01313..870590df562 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -11377,19 +11377,19 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl) SERVER loopback OPTIONS (table_name 'base_tbl'); EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl); - QUERY PLAN -- - Seq Scan on public.base_tbl +QUERY PLAN +--- + Nested Loop Semi Join Output: base_tbl.a - Filter: (SubPlan 1) - SubPlan 1 - -> Result - Output: base_tbl.a - -> Append - -> Async Foreign Scan on public.foreign_tbl foreign_tbl_1 - Remote SQL: SELECT NULL FROM public.base_tbl - -> Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2 - Remote SQL: SELECT NULL FROM public.base_tbl + -> Seq Scan on public.base_tbl + Output: base_tbl.a, base_tbl.b + Filter: (base_tbl.a IS NOT NULL) + -> Materialize + -> Append + -> Async Foreign Scan on public.foreign_t
How to started with Contributions
Respected Sir/Madam I am Joseph Raj Vishal, a Computer Science undergrad. I have just entered third year at Dayananda Sagar College of Engineering in Bengaluru. I am new to open source contributions but I am well aware of Python, Java, Django, PHP, SQL, Javascript,Kotlin and Android. I would love to contribute to your organisation but I would need your help getting started. Hoping to hear from you soon Regards Joseph Raj Vishal
Re: How to started with Contributions
HI Joseph: Maybe the following links are helpful. https://www.postgresql.org/message-id/pine.bsf.4.21.9912052011040.823-100...@thelab.hub.org https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F On Sun, Oct 30, 2022 at 3:42 PM 1DS20CS093 Joseph Raj Vishal < josephrvis...@gmail.com> wrote: > Respected Sir/Madam > I am Joseph Raj Vishal, a Computer Science undergrad. I have just entered > third year at Dayananda Sagar College of Engineering in Bengaluru. I am new > to open source contributions but I am well aware of Python, Java, Django, > PHP, SQL, Javascript,Kotlin and Android. I would love to contribute to your > organisation but I would need your help getting started. > Hoping to hear from you soon > > Regards > Joseph Raj Vishal > -- Best Regards Andy Fan
Re: Segfault on logical replication to partitioned table with foreign children
On Sat, Oct 29, 2022 at 1:01 AM wrote: > > Right now it is possible to add a partitioned table with foreign tables > as its children as a target of a subscription. It can lead to an assert > (or a segfault, if compiled without asserts) on a logical replication > worker when the worker attempts to insert the data received via > replication into the foreign table. Reproduce with caution, the worker > is going to crash and restart indefinitely. The setup: Yes, this looks like a bug and your fix seems correct to me. It would be nice to add a test case for this scenario. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Segfault on logical replication to partitioned table with foreign children
Dilip Kumar writes: > Yes, this looks like a bug and your fix seems correct to me. It would > be nice to add a test case for this scenario. A test case doesn't seem that exciting to me. If we were trying to make it actually work, then yeah, but throwing an error isn't that useful to test. The code will be exercised by replication to a regular partitioned table (I assume we do have tests for that). What I'm wondering about is whether we can refactor this code to avoid so many usually-useless catalog lookups. Pulling the namespace name, in particular, is expensive and we generally are not going to need the result. In the child-rel case it'd be much better to pass the opened relation and let the error-check subroutine work from that. Maybe we should just do it like that at the start of the recursion, too? Or pass the relid and let the subroutine look up the names only in the error case. A completely different line of thought is that this doesn't seem like a terribly bulletproof fix, since children could get added to a partitioned table after we look. Maybe it'd be better to check the relkind at the last moment before we do something that depends on a child table being a relation. regards, tom lane
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On 2022-Oct-27, Michael Paquier wrote: > On Thu, Sep 29, 2022 at 08:09:56PM -0700, Nathan Bossart wrote: > > Looks reasonable to me. > > 0001, to move pg_pwritev_with_retry() to a new home, seems fine, so > applied. Maybe something a bit useless, but while perusing the commits I noticed a forward struct declaration was moved from one file to another; this is claimed to avoid including port/pg_iovec.h in common/file_utils.h. We do that kind of thing in a few places, but in this particular case it seems a bit of a pointless exercise, since pg_iovec.h doesn't include anything else and it's a quite simple header. So I'm kinda proposing that we only do the forward struct initialization dance when it really saves on things -- in particular, when it helps avoid or reduce massive indirect header inclusion. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ >From c170367e5ac7dca2dcd11b5cbb6d351b1b205842 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sun, 30 Oct 2022 15:22:24 +0100 Subject: [PATCH] No need to avoid including pg_iovec.h --- src/include/common/file_utils.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index 2c5dbcb0b1..20fe5806fb 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -15,6 +15,8 @@ #include +#include "port/pg_iovec.h" + typedef enum PGFileType { PGFILETYPE_ERROR, @@ -24,7 +26,6 @@ typedef enum PGFileType PGFILETYPE_LNK } PGFileType; -struct iovec; /* avoid including port/pg_iovec.h here */ #ifdef FRONTEND extern int fsync_fname(const char *fname, bool isdir); @@ -40,7 +41,7 @@ extern PGFileType get_dirent_type(const char *path, int elevel); extern ssize_t pg_pwritev_with_retry(int fd, - const struct iovec *iov, + const iovec *iov, int iovcnt, off_t offset); -- 2.30.2
Re: How to started with Contributions
Thanks, I'll check them out. On Sun, 30 Oct, 2022, 2:47 pm Andy Fan, wrote: > HI Joseph: > > Maybe the following links are helpful. > > > https://www.postgresql.org/message-id/pine.bsf.4.21.9912052011040.823-100...@thelab.hub.org > > https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F > > On Sun, Oct 30, 2022 at 3:42 PM 1DS20CS093 Joseph Raj Vishal < > josephrvis...@gmail.com> wrote: > >> Respected Sir/Madam >> I am Joseph Raj Vishal, a Computer Science undergrad. I have just entered >> third year at Dayananda Sagar College of Engineering in Bengaluru. I am new >> to open source contributions but I am well aware of Python, Java, Django, >> PHP, SQL, Javascript,Kotlin and Android. I would love to contribute to your >> organisation but I would need your help getting started. >> Hoping to hear from you soon >> >> Regards >> Joseph Raj Vishal >> > > > > -- > Best Regards > Andy Fan >
Re: Segfault on logical replication to partitioned table with foreign children
On 2022-Oct-28, ilya.v.gladys...@gmail.com wrote: > This will cause a segfault or raise an assert, because inserting into > foreign tables via logical replication is not possible. The solution I > propose is to add recursive checks of relkind for children of a target, > if the target is a partitioned table. However, I imagine that the only reason we don't support this is that the code hasn't been written yet. I think it would be better to write that code, so that we don't have to raise any error at all (unless the foreign table is something that doesn't support DML, in which case we would have to raise an error). Of course, we still have to fix it in backbranches, but we can just do it as a targeted check at the moment of insert/update, not at the moment of subscription create/alter. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
pg15 inherited stats expressions: cache lookup failed for statistics object
postgres=# \set QUIET CREATE TABLE stxdinp (i int, j int) PARTITION BY RANGE(i); CREATE TABLE stxdinp1 PARTITION OF stxdinp FOR VALUES FROM (1)TO(10); INSERT INTO stxdinp SELECT generate_series(1,9)a; CREATE STATISTICS stxdp ON i,j FROM stxdinp; ANALYZE stxdinp; explain SELECT i, j, COUNT(1) FROM ONLY stxdinp GROUP BY 1,2; ERROR: cache lookup failed for statistics object 4060843 It's evidently an issue with 269b532ae (Add stxdinherit flag to pg_statistic_ext_data) (gdb) bt #0 pg_re_throw () at elog.c:1795 #1 0x0096578a in errfinish (filename=, filename@entry=0xaf0442 "extended_stats.c", lineno=lineno@entry=2467, funcname=funcname@entry=0xaf0720 <__func__.28166> "statext_expressions_load") at elog.c:588 #2 0x007efd07 in statext_expressions_load (stxoid=3914359, inh=, idx=idx@entry=0) at extended_stats.c:2467 #3 0x00913947 in examine_variable (root=root@entry=0x2b5b530, node=node@entry=0x2b88820, varRelid=varRelid@entry=7, vardata=vardata@entry=0x7ffe5baa2f00) at selfuncs.c:5264 #4 0x009141ae in get_restriction_variable (root=root@entry=0x2b5b530, args=args@entry=0x2b88a30, varRelid=varRelid@entry=7, vardata=vardata@entry=0x7ffe5baa2f90, other=other@entry=0x7ffe5baa2f88, varonleft=varonleft@entry=0x7ffe5baa2f87) at selfuncs.c:4848 #5 0x00915535 in eqsel_internal (fcinfo=, negate=negate@entry=false) at selfuncs.c:263 #6 0x009155f6 in eqsel (fcinfo=) at selfuncs.c:226 #7 0x0096a373 in FunctionCall4Coll (flinfo=flinfo@entry=0x7ffe5baa3090, collation=collation@entry=0, arg1=arg1@entry=45462832, arg2=arg2@entry=1320, arg3=arg3@entry=45648432, arg4=arg4@entry=7) at fmgr.c:1198 #8 0x0096a92a in OidFunctionCall4Coll (functionId=, collation=collation@entry=0, arg1=arg1@entry=45462832, arg2=arg2@entry=1320, arg3=arg3@entry=45648432, arg4=arg4@entry=7) at fmgr.c:1434 #9 0x0077e759 in restriction_selectivity (root=root@entry=0x2b5b530, operatorid=operatorid@entry=1320, args=0x2b88a30, inputcollid=0, varRelid=varRelid@entry=7) at plancat.c:1880 #10 0x00728dc9 in clause_selectivity_ext (root=root@entry=0x2b5b530, clause=0x2b889d8, clause@entry=0x2b86e88, varRelid=varRelid@entry=7, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0, use_extended_stats=use_extended_stats@entry=true) at clausesel.c:875 #11 0x007291b6 in clauselist_selectivity_ext (root=root@entry=0x2b5b530, clauses=0x2b88fc0, varRelid=7, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0, use_extended_stats=use_extended_stats@entry=true) at clausesel.c:185 #12 0x0072962e in clauselist_selectivity (root=root@entry=0x2b5b530, clauses=, varRelid=, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0) at clausesel.c:108 #13 0x0072fb1d in get_parameterized_baserel_size (root=root@entry=0x2b5b530, rel=rel@entry=0x2b73900, param_clauses=param_clauses@entry=0x2b88f68) at costsize.c:5015 #14 0x007836f6 in get_baserel_parampathinfo (root=root@entry=0x2b5b530, baserel=baserel@entry=0x2b73900, required_outer=required_outer@entry=0x2b86478) at relnode.c:1346 #15 0x00776819 in create_seqscan_path (root=root@entry=0x2b5b530, rel=rel@entry=0x2b73900, required_outer=required_outer@entry=0x2b86478, parallel_workers=parallel_workers@entry=0) at pathnode.c:937 #16 0x0077a32c in reparameterize_path (root=root@entry=0x2b5b530, path=path@entry=0x2b847b0, required_outer=required_outer@entry=0x2b86478, loop_count=loop_count@entry=1) at pathnode.c:3872 #17 0x007249bc in get_cheapest_parameterized_child_path (root=root@entry=0x2b5b530, rel=, required_outer=required_outer@entry=0x2b86478) at allpaths.c:1996 #18 0x00727619 in add_paths_to_append_rel (root=root@entry=0x2b5b530, rel=rel@entry=0x2b6a6a8, live_childrels=live_childrels@entry=0x2b858e8) at allpaths.c:1597 #19 0x00728084 in set_append_rel_pathlist (root=root@entry=0x2b5b530, rel=rel@entry=0x2b6a6a8, rti=rti@entry=6, rte=rte@entry=0x2b5e1c0) at allpaths.c:1270 #20 0x00727e17 in set_rel_pathlist (root=root@entry=0x2b5b530, rel=0x2b6a6a8, rti=rti@entry=6, rte=0x2b5e1c0) at allpaths.c:483 #21 0x00727f8a in set_base_rel_pathlists (root=root@entry=0x2b5b530) at allpaths.c:355 #22 0x007286fd in make_one_rel (root=root@entry=0x2b5b530, joinlist=joinlist@entry=0x2b6fd98) at allpaths.c:225 #23 0x007512d5 in query_planner (root=root@entry=0x2b5b530, qp_callback=qp_callback@entry=0x75300a , qp_extra=qp_extra@entry=0x7ffe5baa3670) at planmain.c:276 #24 0x007589ec in grouping_planner (root=root@entry=0x2b5b530, tuple_fraction=, tuple_fraction@entry=0) at planner.c:1467 #25 0x0075a5f2 in subquery_planner (glob=, parse=parse@entry=0x2b23c08, parent_root=parent_root@entry=0x2736768, hasRecursion=hasRecursion@entry=false, tuple_fraction=) at planner.c:1044 #26 0x00726567 in set_subquery_pathlist (root=root@entry=0x2736768, rel=rel@entry=0x275
Re: Code checks for App Devs, using new options for transaction behavior
On Fri, 28 Oct 2022 at 10:33, Simon Riggs wrote: > Thanks for the feedback, I will make all of those corrections in the > next version. New version attached. I've rolled 002-004 into one patch, but can split again as needed. -- Simon Riggshttp://www.EnterpriseDB.com/ 001_psql_parse_only.v1.patch Description: Binary data 002_nested_xacts_and_rollback_on_commit.v8.patch Description: Binary data
Re: Schema variables - new implementation for Postgres 15
> On Thu, Oct 13, 2022 at 07:41:32AM +0200, Pavel Stehule wrote: > rebased with simplified code related to usage of pfree function Thanks for the patch, great work! I've got a couple of questions, although I haven't fully finished reviewing yet (so more to come): * I'm curious about ALTER VARIABLE. Current implementation allows altering only the name, schema or the owner -- why not e.g. immutability? * psql tab completion implementation mentions that CREATE VARIABLE could be used inside CREATE SCHEMA: /* CREATE VARIABLE --- is allowed inside CREATE SCHEMA, so use TailMatches */ /* Complete CREATE VARIABLE with AS */ else if (TailMatches("IMMUTABLE")) Is that correct? It doesn't like it works, and from what I see it requires some modifications in transformCreateSchemaStmt and schema_stmt. * psql describe mentions the following: /* * Most functions in this file are content to print an empty table when * there are no matching objects. We intentionally deviate from that * here, but only in !quiet mode, for historical reasons. */ I guess it's taken from listTables, and the extended versions says "because of the possibility that the user is confused about what the two pattern arguments mean". Are those historical reasons apply to variables as well?
Re: Schema variables - new implementation for Postgres 15
Hi ne 30. 10. 2022 v 19:05 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Thu, Oct 13, 2022 at 07:41:32AM +0200, Pavel Stehule wrote: > > rebased with simplified code related to usage of pfree function > > Thanks for the patch, great work! > > I've got a couple of questions, although I haven't fully finished > reviewing yet > (so more to come): > > * I'm curious about ALTER VARIABLE. Current implementation allows altering > only > the name, schema or the owner -- why not e.g. immutability? > It is just in category - "not implemented yet". The name, schema or owner doesn't change behavior. It can be possible (in next versions) to change type, default expression, immutability (I think). But the patch is long enough so I prefer just to support basic generic ALTER related to schema, and other possibilities to implement in next iterations. > > * psql tab completion implementation mentions that CREATE VARIABLE could be > used inside CREATE SCHEMA: > > /* CREATE VARIABLE --- is allowed inside CREATE SCHEMA, so use > TailMatches */ > /* Complete CREATE VARIABLE with AS */ > else if (TailMatches("IMMUTABLE")) > > Is that correct? It doesn't like it works, and from what I see it > requires > some modifications in transformCreateSchemaStmt and schema_stmt. > yes, This is a bug. It should be fixed > > * psql describe mentions the following: > > /* > * Most functions in this file are content to print an empty table > when > * there are no matching objects. We intentionally deviate from > that > * here, but only in !quiet mode, for historical reasons. > */ > > I guess it's taken from listTables, and the extended versions says > "because > of the possibility that the user is confused about what the two pattern > arguments mean". Are those historical reasons apply to variables as well? > The behave is same like the tables (2022-10-30 19:48:14) postgres=# \dt Did not find any relations. (2022-10-30 19:48:16) postgres=# \dV Did not find any session variables. Thank you for comments Pavel
Re: warn if GUC set to an invalid shared library
On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby wrote: > > On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote: > > It caused no issue when I changed: > > > > /* Check that it's acceptable for the indicated > > parameter */ > > if (!parse_and_validate_value(record, name, value, > > - PGC_S_FILE, ERROR, > > + PGC_S_TEST, ERROR, > > &newval, &newextra)) > > > > I'm not sure where to go from here. > > I'm hoping for some guidance ; this simple change may be naive, but I'm not > sure what a wider change would look like. I assume you mean guidance on implementation details here, and not on design. I still think this is a useful patch and I'd be happy to review and try out future iterations, but I don't have any useful input here. Also, for what it's worth, I think requiring the libraries to be in place before running ALTER SYSTEM does not really seem that onerous. I can't really think of use cases it precludes. Thanks, Maciek
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Sun, Oct 30, 2022 at 03:44:32PM +0100, Alvaro Herrera wrote: > So I'm kinda proposing that we only do the forward struct initialization > dance when it really saves on things -- in particular, when it helps > avoid or reduce massive indirect header inclusion. Sure. > extern ssize_t pg_pwritev_with_retry(int fd, > - const struct iovec *iov, > + const iovec *iov, > int iovcnt, > off_t offset); However this still needs to be defined as a struct, no? -- Michael signature.asc Description: PGP signature
Re: Simplifying our Trap/Assert infrastructure
On Fri, Oct 28, 2022 at 09:36:23AM +0200, Peter Eisentraut wrote: > Would there be a use for that? It's currently only used in the atomics > code. Yep, but they would not trigger when using atomics in the frontend code. We don't have any use for that in core on HEAD, still that could be useful for some external frontend code? Please see the attached. -- Michael diff --git a/src/include/c.h b/src/include/c.h index d70ed84ac5..12ff782a63 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -800,7 +800,12 @@ typedef NameData *Name; #include #define Assert(p) assert(p) #define AssertMacro(p) ((void) assert(p)) -#define AssertPointerAlignment(ptr, bndr) ((void)true) + +/* + * Check that `ptr' is `bndr' aligned. + */ +#define AssertPointerAlignment(ptr, bndr) \ + Assert(TYPEALIGN(bndr, (uintptr_t)(ptr)) == (uintptr_t)(ptr)) #else /* USE_ASSERT_CHECKING && !FRONTEND */ signature.asc Description: PGP signature
Re: GUC values - recommended way to declare the C variables?
On Fri, Oct 28, 2022 at 6:05 PM Michael Paquier wrote: > > On Fri, Oct 28, 2022 at 11:48:13AM +0900, Michael Paquier wrote: > > Thanks. I have not looked at the checkup logic yet, but the central > > declarations seem rather sane, and I have a few comments about the > > latter. > > So, I've had the energy to look at the check logic today, and noticed > that, while the proposed patch is doing the job when loading the > in-core GUCs, nothing is happening for the custom GUCs that could be > loaded through shared_preload_libraries or just from a LOAD command. > > After adding an extra check in define_custom_variable() (reworking a > bit the interface proposed while on it), I have found a few more > issues than what's been already found on this thread: > - 5 missing spots in pg_stat_statements. > - 3 float rounding issues in pg_trgm. > - 1 spot in pg_prewarm. > - A few more that had no initialization, but these had a default of > false/0/0.0 so it does not influence the end result but I have added > some initializations anyway. > > With all that addressed, I am finishing with the attached. I have > added some comments for the default definitions depending on the > CFLAGS, explaining the reasons behind the choices made. The CI has > produced a green run, which is not the same as the buildfarm, still > gives some confidence. > > Thoughts? LGTM. The patch was intended to expose mismatches, and it seems to be doing that job already... I only had some nitpicks for a couple of the new comments, below: == 1. src/include/storage/bufmgr.h + +/* effective when prefetching is available */ +#ifdef USE_PREFETCH +#define DEFAULT_EFFECTIVE_IO_CONCURRENCY 1 +#define DEFAULT_MAINTENANCE_IO_CONCURRENCY 10 +#else +#define DEFAULT_EFFECTIVE_IO_CONCURRENCY 0 +#define DEFAULT_MAINTENANCE_IO_CONCURRENCY 0 +#endif Maybe avoid the word "effective" since that is also one of the GUC names. Use uppercase. SUGGESTION /* Only applicable when prefetching is available */ == 2. src/include/utils/ps_status.h +/* Disabled on Windows as the performance overhead can be significant */ +#ifdef WIN32 +#define DEFAULT_UPDATE_PROCESS_TITLE false +#else +#define DEFAULT_UPDATE_PROCESS_TITLE true +#endif extern PGDLLIMPORT bool update_process_title; Perhaps put that comment inside the #ifdef WIN32 SUGGESTION #ifdef WIN32 /* Disabled on Windows because the performance overhead can be significant */ #define DEFAULT_UPDATE_PROCESS_TITLE false #else ... == src/backend/utils/misc/guc.c 3. InitializeGUCOptions @@ -1413,6 +1496,9 @@ InitializeGUCOptions(void) hash_seq_init(&status, guc_hashtab); while ((hentry = (GUCHashEntry *) hash_seq_search(&status)) != NULL) { + /* check mapping between initial and default value */ + Assert(check_GUC_init(hentry->gucvar)); + Use uppercase. Minor re-wording. SUGGESTION /* Check the GUC default and declared initial value for consistency */ ~~~ 4. define_custom_variable Same as #3. -- Kind Regards, Peter Smith Fujitsu Australia
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Wed, Oct 26, 2022 at 10:55 AM Melanie Plageman wrote: + The pg_statio_ and + pg_stat_io views are primarily useful to determine + the effectiveness of the buffer cache. When the number of actual disk reads Totally nitpicking, but this reads a little funny to me. Previously the trailing underscore suggested this is a group, and now with pg_stat_io itself added (stupid question: should this be "pg_statio"?), it sounds like we're talking about two views: pg_stat_io and "pg_statio_". Maybe something like "The pg_stat_io view and the pg_statio_ set of views are primarily..."? + by that backend type in that IO context. Currently only a subset of IO + operations are tracked here. WAL IO, IO on temporary files, and some forms + of IO outside of shared buffers (such as when building indexes or moving a + table from one tablespace to another) could be added in the future. Again nitpicking, but should this be "may be added"? I think "could" suggests the possibility of implementation, whereas "may" feels more like a hint as to how the feature could evolve. + portion of the main shared buffer pool. This pattern is called a + Buffer Access Strategy in the + PostgreSQL source code and the fixed-size + ring buffer is referred to as a strategy ring buffer for + the purposes of this view's documentation. + Nice, I think this explanation is very helpful. You also use the term "strategy context" and "strategy operation" below. I think it's fairly obvious what those mean, but pointing it out in case we want to note that here, too. + read and extended for Maybe "plus" instead of "and" here for clarity (I'm assuming that's what the "and" means)? + backend_types autovacuum launcher, + autovacuum worker, client backend, + standalone backend, background + worker, and walsender for all + io_contexts is similar to the sum of I'm reviewing the rendered docs now, and I noticed sentences like this are a bit hard to scan: they force the reader to parse a big list of backend types before even getting to the meat of what this is talking about. Should we maybe reword this so that the backend list comes at the end of the sentence? Or maybe even use a list (e.g., like in the "state" column description in pg_stat_activity)? + heap_blks_read, idx_blks_read, + tidx_blks_read, and + toast_blks_read in + pg_statio_all_tables. and + blks_read from If using the PostgreSQL extension, + , + read for + backend_types autovacuum launcher, + autovacuum worker, client backend, + standalone backend, background + worker, and walsender for all + io_contexts is equivalent to Same comment as above re: the lengthy list. + Normal client backends should be able to rely on maintenance processes + like the checkpointer and background writer to write out dirty data as Nice--it's great to see this mentioned. But I think these are generally referred to as "auxiliary" not "maintenance" processes, no? + If using the PostgreSQL extension, + , written and + extended for backend_types Again, should this be "plus" instead of "and"? + + bytes_conversion bigint + I think this general approach works (instead of unit). I'm not wild about the name, but I don't really have a better suggestion. Maybe "op_bytes" (since each cell is counting the number of I/O operations)? But I think bytes_conversion is okay. Also, is this (in the middle of the table) the right place for this column? I would have expected to see it before or after all the actual I/O op cells. + io_contexts. When a Buffer Access + Strategy reuses a buffer in the strategy ring, it must evict its + contents, incrementing reused. When a Buffer + Access Strategy adds a new shared buffer to the strategy ring + and this shared buffer is occupied, the Buffer Access + Strategy must evict the contents of the shared buffer, + incrementing evicted. I think the parallel phrasing here makes this a little hard to follow. Specifically, I think "must evict its contents" for the strategy case sounds like a bad thing, but in fact this is a totally normal thing that happens as part of strategy access, no? The idea is you probably won't need that buffer again, so it's fine to evict it. I'm not sure how to reword, but I think the current phrasing is misleading. + The number of times a bulkread found the current + buffer in the fixed-size strategy ring dirty and requiring flush. Maybe "...found ... to be dirty..."? + frequent vacuuming or more aggressive autovacuum settings, as buffers are + dirtied during a bulkread operation when updating the hint bit or when + performing on-access pruning. Are there docs to cross-reference here, especially for pruning? I couldn't find much except a few un-explained mentions in the page layout docs [2], and most of the search results refer to partition pruning. Searching for hint bits at least gives some info in blog posts and the wiki. + again. A high number of repossessions is a sign of contention for the + blocks operated on by the strategy operation. This (and in ge
Re: resowner "cold start" overhead
At Sat, 29 Oct 2022 13:00:25 -0700, Andres Freund wrote in > One way to reduce the size increase would be to use the space for initialarr > to store variables we don't need while initialarr is used. E.g. itemsarr, > maxitems, lastarr are candidates. But I suspect that the code complication > isn't worth it. +1 > A different approach could be to not go for the "embedded initial elements" > approach, but instead to not delete resource owners / resource arrays inside > ResourceOwnerDelete(). We could stash them in a bounded list of resource > owners, to be reused by ResourceOwnerCreate(). We do end up creating a > several resource owners even for the simplest queries. We often do end up creating several resource owners that aquires not an element at all . On the other hand, a few resource owners sometimes grown up to 2048 (several times) or 4096 (one time) elements druing a run of the regressiont tests. (I saw catlist, tupdesc and relref grown to 2048 or more elements.) > The advantage of that scheme is that it'd save more and that we'd only reserve > space for ResourceArrays that are actually used in the current workload - > often the majority of arrays won't be. Thus I believe preserving resource owners works well. Preserving resource arrays also would work for the time efficiency, but some resource owners may end up keeping large amount of memory unnecessarily most of the time for the backend lifetime. I guess that the amount is far less than the possible bloat by catcache.. > A potential problem would be that we don't want to use the "hashing" style > ResourceArrays forever, I don't think they'll be as fast for other cases. But > we could reset the arrays when they get large. I'm not sure linear search (am I correct?) doesn't harm for 2048 or more elements. I think that the "hashing" style doesn't prevent the arrays from being reset (free-d) at transaction end (or at resource owner deletion). That allows releasing unused elements while in transaction but I'm not sure we need to be so keen to reclaim space during a transaction. > Greetings, > > Andres Freund > > https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3%40awork3.anarazel.de regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Lock on ShmemVariableCache fields?
Hi, hackers The VariableCacheData says nextOid and oidCount are protected by OidGenLock. However, we update them without holding the lock on OidGenLock in BootStrapXLOG(). Same as nextXid, for other fields that are protected by XidGenLock, it holds the lock, see SetTransactionIdLimit(). void BootStrapXLOG(void) { [...] ShmemVariableCache->nextXid = checkPoint.nextXid; ShmemVariableCache->nextOid = checkPoint.nextOid; ShmemVariableCache->oidCount = 0; [...] SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); [...] } I also find a similar code in StartupXLOG(). Why we don't hold the lock on OidGenLock when updating ShmemVariableCache->nextOid and ShmemVariableCache->oidCount? If the lock is unnecessary, I think adding some comments is better. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Adding doubly linked list type which stores the number of items in the list
Thank you for having another look at this On Sat, 29 Oct 2022 at 18:32, Bharath Rupireddy wrote: > 1. I guess we need to cast the 'node' parameter too, something like > below? I'm reading the comment there talking about compilers > complaning about the unused function arguments. > dlist_member_check(head, node) ((void) (head); (void) (node);) I looked at dlist_check() and I didn't quite manage to figure out why the cast is needed. As far as I can see, there are no calls where we only pass dlist_head solely for the dlist_check(). For dlist_member_check(), dlist_delete_from() does not use the 'head' parameter for anything apart from dlist_member_check(), so I believe the cast is required for 'head'. I think I'd rather only add the cast for 'node' unless we really require it. Cargo-culting it in there just because that's what the other macros do does not seem like a good idea to me. > Can we put max limit, at least in assert, something like below, on > 'count' instead of saying above? I'm not sure if there's someone > storing 4 billion items, but it will be a good-to-have safety from the > data structure perspective if others think it's not an overkill. > Assert(head->count > 0 && head->count <= PG_UINT32_MAX); 'count' is a uint32. It's always going to be <= PG_UINT32_MAX. My original thoughts were that it seems unlikely we'd ever give an assert build a workload that would ever have 2^32 dlist_nodes in dclist. Having said that, perhaps it would do no harm to add some overflow checks to 'count'. I've gone and added some Assert(head->count > 0) after we do count++. > + * As dlist_delete but performs checks in ILIST_DEBUG to ensure that 'node' > + * belongs to 'head'. > I think 'Same as dlist_delete' instead of just 'As dlist_delete' I don't really see what's wrong with this. We use "As above" when we mean "Same as above" in many locations. Anyway, I don't feel strongly about not adding the word, so I've adjusted the wording in that comment which includes adding the word "Same" at the start. > 5. > + * Caution: 'node' must be a member of 'head'. > + * Caller must ensure that 'before' is a member of 'head'. > Can we have the same comments around something like below? I've adjusted dclist_insert_after() and dclist_insert_before(). Each dclist function that uses dlist_member_check() now has the same text. > 8. Wondering if we need dlist_delete_from() at all. Can't we just add > dlist_member_check() dclist_delete_from() and call dlist_delete() > directly? Certainly, but I made it that way on purpose. I wanted dclist to have a superset of the functions that dlist has. I just see no reason why dlist shouldn't have dlist_delete_from() when dclist has it. I've attached the v3 version of the patch which includes some additional polishing work. David diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index b01b39b008..a34e9b352d 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -196,8 +196,7 @@ typedef struct RewriteMappingFile TransactionId xid; /* xid that might need to see the row */ int vfd;/* fd of mappings file */ off_t off;/* how far have we written yet */ - uint32 num_mappings; /* number of in-memory mappings */ - dlist_head mappings; /* list of in-memory mappings */ + dclist_head mappings; /* list of in-memory mappings */ charpath[MAXPGPATH];/* path, for error messages */ } RewriteMappingFile; @@ -864,9 +863,10 @@ logical_heap_rewrite_flush_mappings(RewriteState state) Oid dboid; uint32 len; int written; + uint32 num_mappings = dclist_count(&src->mappings); /* this file hasn't got any new mappings */ - if (src->num_mappings == 0) + if (num_mappings == 0) continue; if (state->rs_old_rel->rd_rel->relisshared) @@ -874,7 +874,7 @@ logical_heap_rewrite_flush_mappings(RewriteState state) else dboid = MyDatabaseId; - xlrec.num_mappings = src->num_mappings; + xlrec.num_mappings = num_mappings; xlrec.mapped_rel = RelationGetRelid(state->rs_old_rel); xlrec.mapped_xid = src->xid; xlrec.mapped_db = dboid; @@ -882,31 +882,30 @@ logical_heap_rewrite_flush_mappings(RewriteState state) xlrec.start_lsn = state->rs_begin_lsn; /* write all mappings consecutively */ - len = src->num_mappings * sizeof(LogicalRewriteMappingData); + len = num_mappings * sizeof(LogicalRewriteMappingData); waldata_start = waldata = palloc(len);
RE: Segfault on logical replication to partitioned table with foreign children
On Sun, Oct 30, 2022 9:39 PM Tom Lane wrote: > > What I'm wondering about is whether we can refactor this code > to avoid so many usually-useless catalog lookups. Pulling the > namespace name, in particular, is expensive and we generally > are not going to need the result. In the child-rel case it'd > be much better to pass the opened relation and let the error-check > subroutine work from that. Maybe we should just do it like that > at the start of the recursion, too? Or pass the relid and let > the subroutine look up the names only in the error case. > > A completely different line of thought is that this doesn't seem > like a terribly bulletproof fix, since children could get added to > a partitioned table after we look. Maybe it'd be better to check > the relkind at the last moment before we do something that depends > on a child table being a relation. > I agree. So maybe we can add this check in apply_handle_tuple_routing(). diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 5250ae7f54..e941b68e4b 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -2176,6 +2176,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata, Assert(partrelinfo != NULL); partrel = partrelinfo->ri_RelationDesc; + /* Check for supported relkind. */ + CheckSubscriptionRelkind(partrel->rd_rel->relkind, + relmapentry->remoterel.nspname, relmapentry->remoterel.relname); + /* * To perform any of the operations below, the tuple must match the * partition's rowtype. Convert if needed or just copy, using a dedicated Regards, Shi yu
Re: Documentation for building with meson
Hi, On Thu, Oct 27, 2022 at 1:04 AM John Naylor wrote: > +# Run the main pg_regress and isolation tests > +meson test --suite main > > This does not work for me in a fresh install until running > > meson test --suite setup > > In fact, we see in > > https://wiki.postgresql.org/wiki/Meson > > meson test --suite setup --suite main > You are right that this will be needed for a new install. I've added --suite setup in the testing section in the v3 of the patch (attached). > That was just an eyeball check from a naive user -- it would be good to > try running everything documented here. > I retried all the instructions as suggested and they work for me. Regards, Samay > > -- > John Naylor > EDB: http://www.enterprisedb.com > v3-0001-Add-docs-for-building-with-meson.patch Description: Binary data
Prefetch the next tuple's memory during seqscans
As part of the AIO work [1], Andres mentioned to me that he found that prefetching tuple memory during hot pruning showed significant wins. I'm not proposing anything to improve HOT pruning here, but as a segue to get the prefetching infrastructure in so that there are fewer AIO patches, I'm proposing we prefetch the next tuple during sequence scans in while page mode. It turns out the gains are pretty good when we apply this: -- table with 4 bytes of user columns create table t as select a from generate_series(1,1000)a; vacuum freeze t; select pg_prewarm('t'); Master @ a9f8ca600 # select * from t where a = 0; Time: 355.001 ms Time: 354.573 ms Time: 354.490 ms Time: 354.556 ms Time: 354.335 ms Master + 0001 + 0003: # select * from t where a = 0; Time: 328.578 ms Time: 329.387 ms Time: 329.349 ms Time: 329.704 ms Time: 328.225 ms (avg ~7.7% faster) -- table with 64 bytes of user columns create table t2 as select a,a a2,a a3,a a4,a a5,a a6,a a7,a a8,a a9,a a10,a a11,a a12,a a13,a a14,a a15,a a16 from generate_series(1,1000)a; vacuum freeze t2; select pg_prewarm('t2'); Master: # select * from t2 where a = 0; Time: 501.725 ms Time: 501.815 ms Time: 503.225 ms Time: 501.242 ms Time: 502.394 ms Master + 0001 + 0003: # select * from t2 where a = 0; Time: 412.076 ms Time: 410.669 ms Time: 410.490 ms Time: 409.782 ms Time: 410.843 ms (avg ~22% faster) This was tested on an AMD 3990x CPU. I imagine the CPU matters quite a bit here. It would be interesting to see if the same or similar gains can be seen on some modern intel chip too. I believe Thomas wrote the 0001 patch (same as patch in [2]?). I only quickly put together the 0003 patch. I wondered if we might want to add a macro to 0001 that says if pg_prefetch_mem() is empty or not then use that to #ifdef out the code I added to heapam.c. Although, perhaps most compilers will be able to optimise away the extra lines that are figuring out what the address of the next tuple is. My tests above are likely the best case for this. It seems plausible to me that if there was a much more complex plan that found a reasonable number of tuples and did something with them that we wouldn't see the same sort of gains. Also, it also does not seem impossible that the prefetch just results in evicting some useful-to-some-other-exec-node cache line or that the prefetched tuple gets flushed out the cache by the time we get around to fetching the next tuple from the scan again due to various other node processing that's occurred since the seq scan was last called. I imagine such things would be indistinguishable from noise, but I've not tested. I also tried prefetching out by 2 tuples. It didn't help any further than prefetching 1 tuple. I'll add this to the November CF. David [1] https://www.postgresql.org/message-id/flat/20210223100344.llw5an2akleng...@alap3.anarazel.de [2] https://www.postgresql.org/message-id/CA%2BhUKG%2Bpi63ZbcZkYK3XB1pfN%3DkuaDaeV0Ha9E%2BX_p6TTbKBYw%40mail.gmail.com From 2fd10f1266550f26f4395de080bcdcf89b6859b6 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Wed, 19 Oct 2022 08:54:01 +1300 Subject: [PATCH 1/3] Add pg_prefetch_mem() macro to load cache lines. Initially mapping to GCC, Clang and MSVC builtins. Discussion: https://postgr.es/m/CAEepm%3D2y9HM9QP%2BHhRZdQ3pU6FShSMyu%3DV1uHXhQ5gG-dketHg%40mail.gmail.com --- config/c-compiler.m4 | 17 configure | 40 ++ configure.ac | 3 +++ meson.build| 3 ++- src/include/c.h| 8 src/include/pg_config.h.in | 3 +++ src/tools/msvc/Solution.pm | 1 + 7 files changed, 74 insertions(+), 1 deletion(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 000b075312..582a47501c 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -355,6 +355,23 @@ AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1, [Define to 1 if your compiler understands $1.]) fi])# PGAC_CHECK_BUILTIN_FUNC +# PGAC_CHECK_BUILTIN_VOID_FUNC +# --- +# Variant for void functions. +AC_DEFUN([PGAC_CHECK_BUILTIN_VOID_FUNC], +[AC_CACHE_CHECK(for $1, pgac_cv$1, +[AC_LINK_IFELSE([AC_LANG_PROGRAM([ +void +call$1($2) +{ +$1(x); +}], [])], +[pgac_cv$1=yes], +[pgac_cv$1=no])]) +if test x"${pgac_cv$1}" = xyes ; then +AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1, + [Define to 1 if your compiler understands $1.]) +fi])# PGAC_CHECK_BUILTIN_VOID_FUNC # PGAC_CHECK_BUILTIN_FUNC_PTR diff --git a/configure b/configure index 3966368b8d..c4685b8a1e 100755 --- a/configure +++ b/configure @@ -15988,6 +15988,46 @@ _ACEOF fi +# Can we use a built-in to prefetch memory? +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_prefetch" >&5 +$as_echo_n "checking for __builtin_prefetch... " >&6; } +if ${pgac_cv__builtin_prefetch+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end c
Re: pg15 inherited stats expressions: cache lookup failed for statistics object
On Mon, Oct 31, 2022 at 1:05 AM Justin Pryzby wrote: > I think this is what's needed. > > diff --git a/src/backend/utils/adt/selfuncs.c > b/src/backend/utils/adt/selfuncs.c > index 14e0885f19f..4450f0d682f 100644 > --- a/src/backend/utils/adt/selfuncs.c > +++ b/src/backend/utils/adt/selfuncs.c > @@ -5240,6 +5240,8 @@ examine_variable(PlannerInfo *root, Node *node, int > varRelid, > /* skip stats without per-expression stats */ > if (info->kind != STATS_EXT_EXPRESSIONS) > continue; > + if (info->inherit != rte->inh) > + continue; > > pos = 0; > foreach(expr_item, info->exprs) > I think we also need to do this when loading the ndistinct value, to skip statistics with mismatching stxdinherit in estimate_multivariate_ndistinct(). Thanks Richard
Re: GUC values - recommended way to declare the C variables?
On Mon, Oct 31, 2022 at 12:01:33PM +1100, Peter Smith wrote: > SUGGESTION > /* Only applicable when prefetching is available */ Thanks for the suggestion. Done this way, then. > +/* Disabled on Windows as the performance overhead can be significant */ > +#ifdef WIN32 > +#define DEFAULT_UPDATE_PROCESS_TITLE false > +#else > +#define DEFAULT_UPDATE_PROCESS_TITLE true > +#endif > extern PGDLLIMPORT bool update_process_title; > > Perhaps put that comment inside the #ifdef WIN32 I'd keep that externally, as ps_status.h does so. > [...] > SUGGESTION > /* Check the GUC default and declared initial value for consistency */ Okay, fine by me. I have split the change into two parts at the end: one to refactor and fix the C declarations, and a second to introduce the check routine with all the correct declarations in place. FWIW, I have been testing that with my own in-house modules and it has caught a few stupid inconsistencies. Let's see how it goes. -- Michael signature.asc Description: PGP signature
Re: pg15 inherited stats expressions: cache lookup failed for statistics object
On Mon, Oct 31, 2022 at 12:26 PM Richard Guo wrote: > On Mon, Oct 31, 2022 at 1:05 AM Justin Pryzby > wrote: > >> I think this is what's needed. >> >> diff --git a/src/backend/utils/adt/selfuncs.c >> b/src/backend/utils/adt/selfuncs.c >> index 14e0885f19f..4450f0d682f 100644 >> --- a/src/backend/utils/adt/selfuncs.c >> +++ b/src/backend/utils/adt/selfuncs.c >> @@ -5240,6 +5240,8 @@ examine_variable(PlannerInfo *root, Node *node, int >> varRelid, >> /* skip stats without per-expression stats */ >> if (info->kind != STATS_EXT_EXPRESSIONS) >> continue; >> + if (info->inherit != rte->inh) >> + continue; >> >> pos = 0; >> foreach(expr_item, info->exprs) >> > > I think we also need to do this when loading the ndistinct value, to > skip statistics with mismatching stxdinherit in > estimate_multivariate_ndistinct(). > To be concrete, I mean something like attached. BTW, I noticed a micro-optimization opportunity in examine_variable that we can fetch the RangeTblEntry for 'onerel' outside the foreach loop when iterating the extended stats so that we can do it only once rather than for each stat. Thanks Richard v1-0001-Skip-statistics-with-mismatching-stxdinherit-flag.patch Description: Binary data
Re: heavily contended lwlocks with long wait queues scale badly
At Thu, 27 Oct 2022 09:59:14 -0700, Andres Freund wrote in > But I think we can solve that fairly reasonably nonetheless. We can change > PGPROC->lwWaiting to not just be a boolean, but have three states: > 0: not waiting > 1: waiting in waitlist > 2: waiting to be woken up > > which we then can use in LWLockDequeueSelf() to only remove ourselves from the > list if we're on it. As removal from that list is protected by the wait list > lock, there's no race to worry about. Since LWLockDequeueSelf() is defined to be called in some restricted situation, there's no chance that the proc to remove is in another lock waiters list at the time the function is called. So it seems to work well. It is simple and requires no additional memory or cycles... No. It enlarges PRPC by 8 bytes, but changing lwWaiting to int8/uint8 keeps the size as it is. (Rocky8/x86-64) It just shaves off looping cycles. So +1 for what the patch does. > client patched HEAD > 16010960174 > 2 112694 116169 > 4 214287 208119 > 8 377459 373685 > 16 524132 515247 > 32 565772 554726 > 64 587716 497508 > 128 581297 415097 > 256 550296 334923 > 512 486207 243679 > 768 449673 192959 > 1024410836 157734 > 204832622482904 > 409625025232007 > > Not perfect with the patch, but not awful either. Fairly good? Agreed. The performance peak is improved by 6% and shifted to larger number of clients (32->128). > I suspect this issue might actually explain quite a few odd performance > behaviours we've seen at the larger end in the past. I think it has gotten a > bit worse with the conversion of lwlock.c to proclists (I see lots of > expensive multiplications to deal with sizeof(PGPROC)), but otherwise likely > exists at least as far back as ab5194e6f61, in 9.5. > > I guess there's an argument for considering this a bug that we should > backpatch a fix for? But given the vintage, probably not? The only thing that > gives me pause is that this is quite hard to pinpoint as happening. I don't think this is a bug but I think it might be back-patchable since it doesn't change memory footprint (if adjusted), causes no additional cost or interfarce breakage, thus it might be ok to backpatch. Since this "bug" has the nature of positive-feedback so reducing the coefficient is benetifical than the direct cause of the change. > I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc, > but I wanted to get this out to discuss before spending further time. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg15 inherited stats expressions: cache lookup failed for statistics object
On Mon, Oct 31, 2022 at 01:12:09PM +0800, Richard Guo wrote: > BTW, I noticed a micro-optimization opportunity in examine_variable that > we can fetch the RangeTblEntry for 'onerel' outside the foreach loop > when iterating the extended stats so that we can do it only once rather > than for each stat. Isn't that the kind of thing where we'd better have some regression coverage? -- Michael signature.asc Description: PGP signature
Limit of WAL LSN value of type pg_lsn
Hi all, I am trying to determine optimal standby for automatic failover based on pg_last_wal_recieve_lsn() value of all slaves. But what if max_wal_size is reached, is LSN value reset to zero. Please share some documentation that mentions clarification. Also suggest if there is any better approach to get node with best RPO out of all slaves. Thanks & regards, Shubham
Commit fest 2022-11
Hi all, As per the world clock, the next commit fest will begin in 30 hours (11/1 0:00 AoE time). I may have missed something, but it looks like we have no CFM for this one yet. Opinions, thoughts or volunteers? -- Michael signature.asc Description: PGP signature
RE: [PATCH] Support % wildcard in extension upgrade filenames
Just to reiterate the main impetus for this patch is to save PostGIS from shipping 100s of duplicate extension files for each release. > And now with the actual patch attached ... (sorry) > > --strk; > Sandro, can you submit an updated version of this patch. I was testing it out and looked good first time. But I retried just now testing against master, and it fails with commands/extension.o: In function `file_exists': postgresql-git\src\backend\commands/extension.c:3430: undefined reference to `AssertArg' It seems 2 days ago AssertArg and AssertState were removed. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b1099eca8f38f f5cfaf0901bb91cb6a22f909bc6 So your use of AssertArg needs to be replaced with Assert I guess. I did that and was able to test again with a sample extension I made Called: wildtest 1) The wildcard patch in its current state only does anything if wildcard_upgrades = true is in the control file. If it's false or missing, then the behavior of extension upgrades doesn't change. 2) It only understands % as a complete wildcard for a version number So this is legal wildtest--%--2.0.sql This does nothing wildtest--2%--2.0.sql 3) I confirmed that if you have a path such as wildtest--2.0--2.2.sql wildtest--%--2.2.sql then the exact match trumps the wildcard. In the above case if I am on 2.0 and going to 2.2, the wildtest--2.0--2.2.sql script is used instead of the wildtest--% one. 4) It is not possible to downgrade with the wildcard. For example I had paths wildtest--%--2.1.sql and I was unable to go from a version 2.2 down to a version 2.1. I didn't check why that was so, but probably a good thing. If everyone is okay with this patch, we'll go ahead and add tests and documentation to go with it. Thanks, Regina
Re: Adding doubly linked list type which stores the number of items in the list
On Mon, Oct 31, 2022 at 8:26 AM David Rowley wrote: > > I looked at dlist_check() and I didn't quite manage to figure out why > the cast is needed. As far as I can see, there are no calls where we > only pass dlist_head solely for the dlist_check(). For > dlist_member_check(), dlist_delete_from() does not use the 'head' > parameter for anything apart from dlist_member_check(), so I believe > the cast is required for 'head'. I think I'd rather only add the cast > for 'node' unless we really require it. Cargo-culting it in there just > because that's what the other macros do does not seem like a good idea > to me. Hm, you're right, dlist_member_check() needs it. Also, slist_check() needs it for slist_has_next(). dlist_check() doesn't need it, however, keeping it intact doesn't harm, I guess. > My original thoughts were that it seems unlikely we'd ever give an > assert build a workload that would ever have 2^32 dlist_nodes in > dclist. Having said that, perhaps it would do no harm to add some > overflow checks to 'count'. I've gone and added some > Assert(head->count > 0) after we do count++. So, when an overflow occurs, the head->count wraps around after PG_UINT32_MAX, meaning, becomes 0 and we will catch it in an assert build. This looks reasonable to me. However, the responsibility lies with the developers to deal with such overflows. > > 8. Wondering if we need dlist_delete_from() at all. Can't we just add > > dlist_member_check() dclist_delete_from() and call dlist_delete() > > directly? > > Certainly, but I made it that way on purpose. I wanted dclist to have > a superset of the functions that dlist has. I just see no reason why > dlist shouldn't have dlist_delete_from() when dclist has it. Okay. > I've attached the v3 version of the patch which includes some > additional polishing work. Thanks. The v3 patch looks good to me. BTW, do we need sclist_* as well? AFAICS, no such use-case exists needing slist and element count, maybe we don't need. I'm wondering if adding count to dlist_head and maintaining it as part of the existing dlist_* data structure and functions is any better than introducing dclist_*? In that case, we need only one function, dlist_count, no? Or do we choose to go dclist_* because we want to avoid the extra cost of maintaining count within dlist_*? If yes, is maintaining count in dlist_* really costly? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Lock on ShmemVariableCache fields?
HI, On Oct 31, 2022, 10:48 +0800, Japin Li , wrote: > > Hi, hackers > > The VariableCacheData says nextOid and oidCount are protected by > OidGenLock. However, we update them without holding the lock on > OidGenLock in BootStrapXLOG(). Same as nextXid, for other fields > that are protected by XidGenLock, it holds the lock, see > SetTransactionIdLimit(). > > void > BootStrapXLOG(void) > { > [...] > > ShmemVariableCache->nextXid = checkPoint.nextXid; > ShmemVariableCache->nextOid = checkPoint.nextOid; > ShmemVariableCache->oidCount = 0; > > [...] > > SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); > > [...] > } > > I also find a similar code in StartupXLOG(). Why we don't hold the lock > on OidGenLock when updating ShmemVariableCache->nextOid and > ShmemVariableCache->oidCount? > > If the lock is unnecessary, I think adding some comments is better. > > -- > Regrads, > Japin Li. > ChengDu WenWu Information Technology Co.,Ltd. > > As its name BootStrapXLOG, it’s used in BootStrap mode to initialize the template database. The process doesn’t speak SQL and the database is not ready. There won’t be concurrent access to variables. Regards, Zhang Mingli >
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Mon, Oct 31, 2022 at 5:01 AM Michael Paquier wrote: > > On Sun, Oct 30, 2022 at 03:44:32PM +0100, Alvaro Herrera wrote: > > So I'm kinda proposing that we only do the forward struct initialization > > dance when it really saves on things -- in particular, when it helps > > avoid or reduce massive indirect header inclusion. > > Sure. I don't think including pg_iovec.h in file_utils.h is a good idea. I agree that pg_iovec.h is fairly a small header file but file_utils.h is included in 21 .c files, as of today and the file_utils.h footprint might increase in future. Therefore, I'd vote for forward struct initialization as it is on HEAD today. > > extern ssize_t pg_pwritev_with_retry(int fd, > > - const struct iovec *iov, > > + const iovec *iov, > > int iovcnt, > > off_t offset); > > However this still needs to be defined as a struct, no? Yes, we need a struct there because we haven't typedef'ed struct iovec. Also, the patch forgets to remove "port/pg_iovec.h" from file_utils.c -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: GUC values - recommended way to declare the C variables?
On Mon, Oct 31, 2022 at 4:02 PM Michael Paquier wrote: > > On Mon, Oct 31, 2022 at 12:01:33PM +1100, Peter Smith wrote: > > SUGGESTION > > /* Only applicable when prefetching is available */ > > Thanks for the suggestion. Done this way, then. > > > +/* Disabled on Windows as the performance overhead can be significant */ > > +#ifdef WIN32 > > +#define DEFAULT_UPDATE_PROCESS_TITLE false > > +#else > > +#define DEFAULT_UPDATE_PROCESS_TITLE true > > +#endif > > extern PGDLLIMPORT bool update_process_title; > > > > Perhaps put that comment inside the #ifdef WIN32 > > I'd keep that externally, as ps_status.h does so. > > > [...] > > SUGGESTION > > /* Check the GUC default and declared initial value for consistency */ > > Okay, fine by me. > > I have split the change into two parts at the end: one to refactor and > fix the C declarations, and a second to introduce the check routine > with all the correct declarations in place. > > FWIW, I have been testing that with my own in-house modules and it has > caught a few stupid inconsistencies. Let's see how it goes. Thanks for pushing. -- Kind Regards, Peter Smith. Fujitsu Australia.