Re: Huge memory consumption on partitioned table with FKs
Hi Hackers, >> I would embed all this knowledge in ri_BuildQueryKey though, without >> adding the new function ri_GetParentConstOid. I don't think that >> function meaningful abstraction value, and instead it would make what I >> suggest more difficult. > It seems to me reasonable. Indeed, I tried to get the conparentid with ri_BuildQueryKey. (v2_reduce_ri_SPI-plan-hash.patch) > Somewhat of a detour, but in reviewing the patch for > Statement-Level RI checks, Andres and I observed that SPI > made for a large portion of the RI overhead. If this can be resolved by reviewing the Statement-Level RI checks, then I think it would be better. > BTW, one problem with Kuroda-san's patch is that it's using > conparentid as the shared key, which can still lead to multiple > queries being generated when using multi-level partitioning, because > there would be many (intermediate) parent constraints in that case. Horiguchi-san also mentiond, In my patch, in ri_GetParentConstOid, iterate on getting the constraint OID until comparentid == InvalidOid. This should allow to get the "root constraint OID". However, the names "comparentid" and "ri_GetParentConstOid" are confusing. We should use names like "constraint_root_id", as in Amit-san's patch. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com v2_reduce_ri_SPI-plan-hash.patch Description: Binary data
Re: Huge memory consumption on partitioned table with FKs
Hi Amit, > I have attached a patch in which I've tried to merge the ideas from > both my patch and Kuroda-san's. I liked that his patch added > conparentid to RI_ConstraintInfo because that saves a needless > syscache lookup for constraints that don't have a parent. I've kept > my idea to compute the root constraint id only once in > ri_LoadConstraint(), not on every invocation of ri_BuildQueryKey(). > Kuroda-san, anything you'd like to add to that? Thank you for the merge! It looks good to me. I think a fix for InvalidateConstraintCacheCallBack() is also good. I also confirmed that the patch passed the make check-world. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Huge memory consumption on partitioned table with FKs
Hi Amit-san, I noticed that this patch (v3-0001-ri_triggers.c-Use-root-constraint-OID-as-key-to-r.patch) is not registered in the commitfest. I think it needs to be registered for commit, is that correct? I have confirmed that this patch can be applied to HEAD (master), and that check-world PASS. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Huge memory consumption on partitioned table with FKs
Hi hackers, > > > > I did some cosmetic fooling with this (mostly, rewriting the comments > > YA time) and pushed it. > > Perfect. Thanks for your time on this. > Thank you for your help! I'm glad to solve it. -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Huge memory consumption on partitioned table with FKs
Hi Hackers, Analyzed the problem and created a patch to resolve it. # Problem 1 When you create a foreign key to a partitioned table, referential integrity function is created for the number of partitions. Internally, SPI_prepare() creates a plan and SPI_keepplan() caches the plan. The more partitions in the referencing table, the more plans will be cached. # Problem 2 When referenced table is partitioned table, the larger the number of partitions, the larger the plan size to be cached. The actual plan processed is simple and small if pruning is enabled. However, the cached plan will include all partition information. The more partitions in the referenced table, the larger the plan size to be cached. # Idea for solution Constraints with the same pg_constraint.parentid can be combined into one plan with the same comparentid if we can guarantee that all their contents are the same. Problem 1 can be solved and significant memory bloat can be avoided. CachedPlan: 710MB -> 1466kB Solving Problem 2 could also reduce memory, but I don't have a good idea. Currently, DISCARD ALL does not discard CachedPlan by SPI as in this case. It may be possible to modify DISCARD ALL to discard CachedPlan and run it periodically. However, we think the burden on the user is high. # result with patch(PG14 HEAD(e522024b) + patch) name | bytes | pg_size_pretty --+-+ CachedPlanQuery | 12912 | 13 kB CachedPlanSource | 17448 | 17 kB CachedPlan | 1501192 | 1466 kB CachedPlan * 1 Record postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name = 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%'; count --- 1 postgres=# SELECT * FROM pg_backend_memory_contexts WHERE name = 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%'; -[ RECORD 1 ]-+-- name | CachedPlan ident | SELECT 1 FROM "public"."ps" x WHERE "c1" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x parent| CacheMemoryContext level | 2 total_bytes | 2101248 total_nblocks | 12 free_bytes| 613256 free_chunks | 1 used_bytes| 1487992 (1 record) # result without patch(PG14 HEAD(e522024b)) name | bytes | pg_size_pretty --+---+ CachedPlanQuery | 1326280 | 1295 kB CachedPlanSource | 1474528 | 1440 kB CachedPlan | 744009200 | 710 MB CachedPlan * 500 Records postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name = 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%'; count --- 500 SELECT * FROM pg_backend_memory_contexts WHERE name = 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%'; name | CachedPlan ident | SELECT 1 FROM "public"."ps" x WHERE "c1" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x parent| CacheMemoryContext level | 2 total_bytes | 2101248 total_nblocks | 12 free_bytes| 613256 free_chunks | 1 used_bytes| 1487992 ...(500 same records) Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com v1_reduce_ri_SPI-plan-hash.patch Description: Binary data
Re: Huge memory consumption on partitioned table with FKs
Hi Amit-san, > Thanks for checking. Indeed, it should have been added to the January > commit-fest. I've added it to the March one: > > https://commitfest.postgresql.org/32/2930/ Thank you for your quick response! -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: simplifying foreign key/RI checks
Hi, Amit-san, Nice patch. I have confirmed that this solves the problem in [1] with INSERT/UPDATE. HEAD + patch name | bytes | pg_size_pretty --+---+ CachedPlanQuery | 10280 | 10 kB CachedPlanSource | 14616 | 14 kB CachedPlan | 13168 | 13 kB ★ 710MB -> 13kB (3 rows) > > This patch completely sidesteps the DELETE case, which has more insidious > > performance implications, but is also far less common, and whose solution > > will likely be very different. > > Yeah, we should continue looking into the ways to make referenced-side > RI checks be less bloated. However, as already mentioned, the problem of memory bloat on DELETE remains. This can be solved by the patch in [1], but I think it is too much to apply this patch only for DELETE. What do you think? [1] https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1 -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: simplifying foreign key/RI checks
Hi Amit-san, Thanks for the answer! > If you only tested insert/update on the referencing table, I would've > expected to see nothing in the result of that query, because the patch > eliminates all use of SPI in that case. I suspect the CachedPlan* > memory contexts you are seeing belong to some early activity in the > session. So if you try the insert/update in a freshly started > session, you would see 0 rows in the result of that query. That's right. CREATE PARTITION TABLE included in the test script(rep.sql) was using SPI. In a new session, I confirmed that CachedPlan is not generated when only execute INSERT. # only execute INSERT postgres=# INSERT INTO ps SELECT generate_series(1,4999); INSERT 0 4999 postgres=# postgres=# INSERT INTO pr SELECT i, i from generate_series(1,4999)i; INSERT 0 4999 postgres=# SELECT name, sum(used_bytes) as bytes, pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts WHERE name LIKE 'Cached%' GROUP BY name; name | bytes | pg_size_pretty --+---+ (0 rows) ★ No CachedPlan > Hmm, the patch tries to solve a general problem that SPI plans are not > being shared among partitions whereas they should be. So I don't > think that it's necessarily specific to DELETE. Until we have a > solution like the patch on this thread for DELETE, it seems fine to > consider the other patch as a stopgap solution. I see. So this is a solution to the problem of using SPI plans in partitions, not just DELETE. I agree with you, I think this is a solution to the current problem. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Wrong value in metapage of GIN INDEX.
Hi Moon-san. Thank you for posting. We are testing the GIN index onJSONB type. The default maintenance_work_mem (64MB) was fine in usually cases. However, this problem occurs when indexing very large JSONB data. best regards, Keisuke Kuroda 2019年8月29日(木) 17:20 Moon, Insung : > Dear Hackers. > > Kuroda-san and I are interested in the GIN index and have been testing > various things. > While testing, we are found a little bug. > Some cases, the value of nEntries in the metapage was set to the wrong > value. > > This is a reproduce of bug situation. > =# SET maintenance_work_mem TO '1MB'; > =# CREATE TABLE foo(i jsonb); > =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM > generate_series(1, 1) AS i; > > # Input the same value again. > =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM > generate_series(1, 1) AS i; > # Creates GIN Index. > =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops); > > > =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH > (fastupdate=off); > -[ RECORD 1 ]+--- > pending_head | 4294967295 > pending_tail | 4294967295 > tail_free_size | 0 > n_pending_pages | 0 > n_pending_tuples | 0 > n_total_pages| 74 > n_entry_pages| 69 > n_data_pages | 4 > n_entries| 20004 <--★ > version | 2 > > In this example, the nentries value should be 10001 because the gin > index stores duplicate values in one leaf(posting tree or posting > list). > But, if look at the nentries value of metapage using pageinspect, it > is stored as 20004. > So, Let's run the vacuum. > > > =# VACUUM foo; > =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)); > -[ RECORD 1 ]+--- > pending_head | 4294967295 > pending_tail | 4294967295 > tail_free_size | 0 > n_pending_pages | 0 > n_pending_tuples | 0 > n_total_pages| 74 > n_entry_pages| 69 > n_data_pages | 4 > n_entries| 10001 <--★ > version | 2 > > Ah. Run to the vacuum, nEntries is changing the normal value. > > There is a problem with the ginEntryInsert function. That calls the > table scan when creating the gin index, ginBuildCallback function > stores the new heap value inside buildstate struct. > And next step, If GinBuildState struct is the size of the memory to be > using is equal to or larger than the maintenance_work_mem value, run > to input value into the GIN index. > This process is a function called ginEnctryInsert. > The ginEntryInsert function called at this time determines that a new > entry is added and increase the value of nEntries. > However currently, ginEntryInsert is first to increase in the value of > nEntries, and to determine if there are the same entries in the > current GIN index. > That causes the bug. > > The patch is very simple. > Fix to increase the value of nEntries only when a non-duplicate GIN > index leaf added. > > This bug detection and code fix worked with Kuroda-san. > > Best Regards. > Moon. >
Re: Date-time extraneous fields with reserved keywords
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi Joseph, Good catch. Of the reserved words that are special values of type Date/Time, 'now', 'today', 'tomorrow', 'yesterday', and 'allballs', I get an error even before applying the patch. One thing I noticed is that the following SQL returns normal results even after applying the patch. postgres=# select timestamp 'epoch 01:01:01'; timestamp - 1970-01-01 00:00:00 (1 row) When 'epoch','infinity','-infinity' and time are specified together, the time specified in the SQL is not included in result. I think it might be better to assume that this pattern is also an error. What do you think? As a side note, reserved words such as 'today', 'tomorrow', and 'yesterday' can be used to specify a time. postgres=# select timestamp 'today 01:01:01'; timestamp - 2023-03-03 01:01:01 (1 row) Best Regards, Keisuke Kuroda NTT Comware The new status of this patch is: Waiting on Author
Re: Date-time extraneous fields with reserved keywords
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Thank you for the response and new patch. The scope of impact is limited to 'epoch' and 'infinity'. Also, it is unlikely that these reserved words will be used in combination with time/date, so this patch is appropriate. The new status of this patch is: Ready for Committer
Re: control max length of parameter values logged
Hi, BIND parameter truncation is good to me. Logs could be very hard to read due to the very long parameters recorded. Now, parameter is extracuted from left. e.g. "----" to "--..." Is not necessary from right? e.g. "----" to "...--" It is sometimes nice to be able to check the last of strings. For example, if there are difference only in the last of strings of many parameters. Best Regards, Keisuke Kuroda
In PG12, query with float calculations is slower than PG11
postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on) select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e) from realtest; QUERY PLAN -- Seq Scan on public.realtest (cost=0.00..307328.38 rows=10828150 width=40) (actual time=0.012..4009.012 rows=1001 loops=1) Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e) Buffers: shared hit=63695 Planning Time: 0.038 ms Execution Time: 4679.162 ms (5 rows) Samples: 5K of event 'cpu-clock', Event count (approx.): 137675 Overhead Command Shared Object Symbol 31.43% postgres postgres [.] ExecInterpExpr 14.24% postgres postgres [.] float84mul 10.40% postgres [vdso] [.] __vdso_clock_gettime ★ 5.41% postgres libc-2.17.so [.] __isinf 4.63% postgres postgres [.] tts_buffer_heap_getsomeattrs 4.03% postgres postgres [.] ExecScan 3.54% postgres libc-2.17.so [.] __clock_gettime 3.12% postgres postgres [.] HeapTupleSatisfiesVisibility 2.36% postgres postgres [.] heap_getnextslot 2.16% postgres postgres [.] heapgettup_pagemode 2.09% postgres postgres [.] standard_ExecutorRun 2.07% postgres postgres [.] SeqNext 2.03% postgres postgres [.] ExecProcNodeInstr 2.03% postgres postgres [.] tts_virtual_clear PG 12 is still slower compared to PG 11, but the __isinf() situation is better with the patch. Best Regards, Keisuke Kuroda check-float-val.patch Description: Binary data
Re: In PG12, query with float calculations is slower than PG11
Hi, I have been testing with newer compiler (clang-7) and result is a bit different at least with clang-7. Compiling PG 12.1 (even without patch) with clang-7 results in __isinf() no longer being a bottleneck, that is, you don't see it in profiler at all. So, there is no issue for people who use the modern clang toolchain, but maybe that's not everyone. So there would still be some interest in doing something about this. * clang bash-4.2$ which clang /opt/rh/llvm-toolset-7.0/root/usr/bin/clang bash-4.2$ clang -v clang version 7.0.1 (tags/RELEASE_701/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rh/llvm-toolset-7.0/root/usr/bin Found candidate GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 Found candidate GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8 Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2 Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5 Selected GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 ** pg_config --- CONFIGURE = '--prefix=/var/lib/pgsql/pgsql/12.1' 'CC=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang' 'PKG_CONFIG_PATH=/opt/rh/llvm-toolset-7.0/root/usr/lib64/pkgconfig' CC = /opt/rh/llvm-toolset-7.0/root/usr/bin/clang --- * result(PostgreSQL 12.1 (even without patch)) postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on) select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e) from realtest; QUERY PLAN --- Seq Scan on public.realtest (cost=0.00..288697.59 rows=1115 width=40) (actual time=0.012..3878.284 rows=1001 loops=1) Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e) Buffers: shared hit=63695 Planning Time: 0.038 ms Execution Time: 4533.767 ms (5 rows) Samples: 5K of event 'cpu-clock', Event count (approx.): 127500 Overhead Command Shared Object Symbol 33.92% postgres postgres [.] ExecInterpExpr 13.27% postgres postgres [.] float84mul 10.86% postgres [vdso] [.] __vdso_clock_gettime 5.49% postgres postgres [.] tts_buffer_heap_getsomeattrs 3.96% postgres postgres [.] ExecScan 3.25% postgres libc-2.17.so [.] __clock_gettime 3.16% postgres postgres [.] heap_getnextslot 2.41% postgres postgres [.] tts_virtual_clear 2.39% postgres postgres [.] SeqNext 2.22% postgres postgres [.] InstrStopNode Best Regards, Keisuke Kuroda 2020年2月7日(金) 3:48 Andres Freund : > Hi, > > On 2020-02-06 11:03:51 -0500, Tom Lane wrote: > > Andres seems to be of the opinion that the compiler should be willing > > to ignore the semantic requirements of the C standard in order > > to rearrange the code back into the cheaper order. That sounds like > > wishful thinking to me ... even if it actually works on his compiler, > > it certainly isn't going to work for everyone. > > Sorry, but, uh, what are you talking about? Please tell me which single > standards violation I'm advocating for? > > I was asking about the inlining bit because the first email of the topic > explained that as the problem, which I don't believe can be the full > explanation - and it turns out it isn't. As Amit Langote's followup > email explained, there's the whole issue of the order of checks being > inverted - which is clearly bad. And wholly unrelated to inlining. > > And I asked about __isinf() being used because there are issues with > accidentally ending up with the non-intrinsic version of isinf() when > not using gcc, due to badly written standard library headers. > > > > The patch looks unduly invasive to me, but I think that it might be > > right that we should go back to a macro-based implementation, because > > otherwise we don't have a good way to be certain that the function > > parameter won't get evaluated first. > > I'd first like to see some actual evidence of this being a problem, > rather than just the order of the checks. > > > > (Another reason to do so is so that the file/line numbers generated > > for the error reports go back to being at least a little bit useful.) > > We could use local variables within the macro to avoid double evals, > > if anyone thinks that's actually important --- I don't. > > I don't think that's necessarily a good idea. In fact, I think
Re: In PG12, query with float calculations is slower than PG11
Thank you very much everyone. Improvement was confirmed even if PG12_STABLE was built with gcc 4.8.5. * PG_12_STABLE * gcc 4.8.5 postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on) select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e) from realtest; QUERY PLAN Seq Scan on public.realtest (cost=0.00..288692.14 rows=873 width=40) (actual time=0.012..4118.432 rows=1001 loops=1) Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e) Buffers: shared hit=63695 Planning Time: 0.034 ms Execution Time: 4811.957 ms (5 rows) 32.03% postgres postgres [.] ExecInterpExpr 12.28% postgres postgres [.] float84mul 9.62% postgres [vdso] [.] __vdso_clock_gettime 6.45% postgres libc-2.17.so [.] __isinf 5.15% postgres postgres [.] tts_buffer_heap_getsomeattrs 3.83% postgres postgres [.] ExecScan Best Regards, Keisuke Kuroda 2020年2月14日(金) 13:29 Amit Langote : > On Fri, Feb 14, 2020 at 3:47 AM Andres Freund wrote: > > On 2020-02-13 13:40:43 -0500, Tom Lane wrote: > > > ... and pushed. One other change I made beyond those suggested > > > was to push the zero-divide ereport's out-of-line as well. > > > > Thanks! > > Thank you all. > > I repeated some of the tests I did earlier and things look good. > > gcc-8 > = > > HEAD > > latency average = 296.842 ms > > 42.05% postgres postgres [.] ExecInterpExpr > 15.14% postgres postgres [.] float8_accum > 9.32% postgres libc-2.17.so [.] __isinf > 7.32% postgres postgres [.] dsqrt > 5.67% postgres postgres [.] float8mul > 4.20% postgres postgres [.] ftod > > 11.7 > > latency average = 289.439 ms > > 41.52% postgres postgres [.] ExecInterpExpr > 13.59% postgres libc-2.17.so [.] __isinf > 10.98% postgres postgres [.] float8_accum > 8.26% postgres postgres [.] dsqrt > 6.17% postgres postgres [.] float8mul > 3.65% postgres postgres [.] ftod > > clang-7 > === > > HEAD > > latency average = 233.735 ms > > 43.84% postgres postgres [.] ExecInterpExpr > 15.17% postgres postgres [.] float8_accum > 8.25% postgres postgres [.] dsqrt > 7.35% postgres postgres [.] float8mul > 5.84% postgres postgres [.] ftod > 3.78% postgres postgres [.] tts_buffer_heap_getsomeattrs > > 11.7 > > latency average = 221.009 ms > > 49.55% postgres postgres [.] ExecInterpExpr > 12.05% postgres postgres [.] float8_accum > 8.97% postgres postgres [.] dsqrt > 6.72% postgres postgres [.] float8mul > 5.62% postgres postgres [.] ftod > 2.18% postgres postgres [.] slot_deform_tuple > > HEAD and PG 11 are now comparable even when built with gcc. > > Regards, > Amit >
Re: Exposure related to GUC value of ssl_passphrase_command
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested I tested the patch on the master branch (addd034) and it works fine. I think that test case which non-superuser can't see this parameter is unnecessary. There is a similar test for pg_read_all_settings role. The new status of this patch is: Ready for Committer
Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
ndId); 1748 1749 if (command_id < change->data.command_id) 1750 { 1751 command_id = change->data.command_id; 1752 1753 if (!snapshot_now->copied) 1754 { 1755 /* we don't use the global one anymore */ 1756 snapshot_now = ReorderBufferCopySnap(rb, snapshot_now, 1757 txn, command_id); 1758 } 1759 1760 snapshot_now->curcid = command_id; 1761 1762 TeardownHistoricSnapshot(false); 1763 SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash); 1764 1765 /* 1766 * Every time the CommandId is incremented, we could 1767 * see new catalog contents, so execute all 1768 * invalidations. 1769 */ 1770 ReorderBufferExecuteInvalidations(rb, txn); 1771 } 1772 1773 break; Do you have any solutions? -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com #!/bin/bash PGHOME="/usr/pgsql-12" CLUSTER_PUB="/home/postgres/relfilenode/data_pub" CLUSTER_SUB="/home/postgres/relfilenode/data_sub" ${PGHOME}/bin/pg_ctl -D ${CLUSTER_PUB} stop ${PGHOME}/bin/pg_ctl -D ${CLUSTER_SUB} stop rm -rf ${CLUSTER_PUB} rm -rf ${CLUSTER_SUB} ${PGHOME}/bin/initdb -D ${CLUSTER_PUB} --no-locale -E UTF8 -k ## Configuration echo "shared_buffers = 1GB" >> ${CLUSTER_PUB}/postgresql.conf echo "max_locks_per_transaction = 1" >> ${CLUSTER_PUB}/postgresql.conf echo "synchronous_commit = off" >> ${CLUSTER_PUB}/postgresql.conf echo "archive_mode = off" >> ${CLUSTER_PUB}/postgresql.conf echo "fsync = off" >> ${CLUSTER_PUB}/postgresql.conf echo "full_page_writes = off" >> ${CLUSTER_PUB}/postgresql.conf echo "autovacuum = off" >> ${CLUSTER_PUB}/postgresql.conf echo "checkpoint_timeout = 120min" >> ${CLUSTER_PUB}/postgresql.conf echo "max_wal_size = 20GB" >> ${CLUSTER_PUB}/postgresql.conf echo "wal_level = logical" >> ${CLUSTER_PUB}/postgresql.conf echo "max_wal_senders = 10" >> ${CLUSTER_PUB}/postgresql.conf echo "wal_sender_timeout = 0" >> ${CLUSTER_PUB}/postgresql.conf echo "wal_receiver_timeout = 0" >> ${CLUSTER_PUB}/postgresql.conf echo "port = 5432" >> ${CLUSTER_PUB}/postgresql.conf cp -p -r ${CLUSTER_PUB} ${CLUSTER_SUB} echo "port = 6432" >> ${CLUSTER_SUB}/postgresql.conf ${PGHOME}/bin/pg_ctl -D ${CLUSTER_PUB} start ## create partition ${PGHOME}/bin/psql -p 5432 << "EOF" DO $$ DECLARE i INT; j INT; schema TEXT; tablename TEXT; partname TEXT; BEGIN FOR i IN 1 .. 2 LOOP schema := 'nsp_' || to_char(i, 'FM000'); tablename := 'tbl_' || to_char(i, 'FM000'); EXECUTE 'CREATE SCHEMA ' || schema; EXECUTE 'CREATE TABLE ' || schema || '.' || tablename || ' ( big01 BIGINT, big02 BIGINT, big03 BIGINT, big04 BIGINT, big05 BIGINT, big06 BIGINT, big07 BIGINT, big08 BIGINT, big09 BIGINT, big10 BIGINT, big11 BIGINT, big12 BIGINT, byt01 BYTEA, byt02 BYTEA ) PARTITION BY RANGE (big02)'; FOR j IN 1 .. 1000 LOOP partname := 'part_' || to_char(j, 'FM'); EXECUTE 'CREATE TABLE ' || schema || '.' || partname || ' PARTITION OF ' || schema || '.' || tablename || ' FOR VALUES FROM (' || (j-1)*10 || ') TO (' || (j)*10 || ')'; EXECUTE 'ALTER TABLE ' || schema || '.' || partname || ' ADD PRIMARY KEY (big01)'; END LOOP; EXECUTE 'CREATE INDEX ON ' || schema || '.' || tablename || ' (big03)'; END LOOP; END; $$ LANGUAGE plpgsql; EOF ${PGHOME}/bin/psql -p 5432 << "EOF" CREATE TABLE test(id int primary key); CREATE PUBLICATION pubdb_test FOR TABLE public.test; EOF ${PGHOME}/bin/pg_ctl -D ${CLUSTER_SUB} start ${PGHOME}/bin/psql -p 6432 << "EOF" CREATE TABLE test(id int primary key); CREATE SUBSCRIPTION subdb_test CONNECTION 'host=localhost dbname=postgres' PUBLICATION pubdb_test WITH (copy_data = false); EOF # insert data(create relfilenodemaphash) ${PGHOME}/bin/psql -p 5432 << "EOF" DO $$ DECLARE i INT; j INT; schema TEXT; tablename TEXT; BEGIN FOR i IN 1 .. 2 LOOP schema := 'nsp_' || to_char(i, 'FM000'); tablename := 'tbl_' || to_char(i, 'FM000'); EXECUTE 'INSERT INTO ' || schema || '.' || tablename || ' SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM generate_series(0,,1) AS i'; END LOOP; END; $$ LANGUAGE plpgsql; EOF ${PGHOME}/bin/psql -p 5432 << "EOF" TRUNCATE TABLE nsp_001.tbl_001; EOF echo "---walsender process info---" ps -aux | grep walsender | grep -v grep
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Amit, Thank you for the reply! > However, after commit c55040ccd0 (When wal_level=logical, > write invalidations at command end into WAL so that decoding can use > this information.) we actually know exactly when we need to execute > each invalidation. I see, thank you for your explaination. I'll try to think about the solution by using XLOG_INVALIDATIONS and referring to the thread "PATCH: logical_work_mem and logical streaming of large in-progress transactions". Best Regards, Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Amit, > You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS. Thank you, that's right. XLOG_INVALIDATIONS is added at COMMIT, so I need to refer to XLOG_XACT_INVALIDATIONS. > Okay. Feel free to clarify your questions if you have any? Are you > interested in writing a patch for the same? Thank you! Yes, I would like to write a patch. If you already have a discussed thread or patch for this idea, please let me know. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Amit, > I don't have a patch for this idea but you can refer [1] > (v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see > what I was trying to say about registering the invalidations in the > form of ReorderBufferChange. We have something along the lines of what > I described above in that patch but we have removed it because we need > all the invalidations to be accumulated to handle aborts and we don't > want two different mechanisms to store invalidations. Thanks, I read the patch (v25-0002-Issue-individual-invalidations-with-wal_level-lo) and the review. I understand the following. * In v25-0002, there were two different mechanisms, XLOG_XACT_INVALIDATIONS (ReorderBufferAddInvalidation) and XLOG_INVALIDATIONS (ReorderBufferAddInvalidations). * After a review, XLOG_XACT_INVALIDATIONS was implemented to generate all invalidation messages. I'm going to write a patch that looks like the following. * Add the process of collecting invalidation to XLOG_XACT_INVALIDATIONS in the form of ReorderBufferChange. * Invalidation is executed in case REORDER_BUFFER_CHANGE_INVALIDATION. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Dilip, Amit, Thank you for the patch! I test the patch on the master HEAD(9796f455) and it works fine. * make installcheck-world: passed * walsender process continues to use 100% of the CPU 1core when TRUNCATE 1000 partition: 1s or less ** before patch : 230s There is "ReorderBufferAddInvalidation" in reorderbuffer.h, but I don't think it's needed. src/include/replication/reorderbuffer.h +void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId, XLogRecPtr lsn, + int nmsgs, SharedInvalidationMessage *msgs); If possible, I'd like to improve it even before PG13, but I recognize that it's difficult because it uses a PG14 or later mechanism... Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
> > I test the patch on the master HEAD(9796f455) and it works fine. > > * make installcheck-world: passed > > * walsender process continues to use 100% of the CPU 1core when > > TRUNCATE 1000 partition: 1s or less > > ** before patch : 230s > > Does this result indicate that it is still CPU bound but it does the > actual decoding and completes in 1s instead of spending 230s mainly to > execute unnecessary invalidations? Okay, to check the decodeing and invalidation times, I got the time to return the results to pg_recvlogical and perf. Before the patch, most of the processing was done by hash_seq_search in the ReferenodeMapInvalidateCallback. After the patch, this problem has been resolved. # test (1) start pg_recvlogical (2) INSERT to 2000 partition for creating RelfilenodemapHash (3) TRUNCATE 1000 partition (1) pg_recvlogical --create-slot --start -f - --if-not-exists \ --plugin=test_decoding --slot=test1 --dbname=postgres --username=postgres \ --option=include-timestamp | gawk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }' (2) DO $$ DECLARE i INT; j INT; schema TEXT; tablename TEXT; BEGIN FOR i IN 1 .. 2 LOOP schema := 'nsp_' || to_char(i, 'FM000'); tablename := 'tbl_' || to_char(i, 'FM000'); EXECUTE 'INSERT INTO ' || schema || '.' || tablename || ' SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM generate_series(0,,1) AS i'; END LOOP; END; $$ LANGUAGE plpgsql; (3) TRUNCATE TABLE nsp_001.tbl_001; # before (3) TRUNCATE decode + invalidation = 222s 2020-10-01 16:58:29 BEGIN 529 2020-10-01 17:02:11 COMMIT 529 (at 2020-10-01 16:58:29.281747+09) (Before the patch was applied, I was concerned that the "table nsp_001.tbl_001 ... TRUNCATE" was not outputting.) Samples: 228K of event 'cpu-clock:uhH', Event count (approx.): 5708075 Overhead Command Shared Object Symbol - 96.00% postgres postgres[.] hash_seq_search hash_seq_search - RelfilenodeMapInvalidateCallback - LocalExecuteInvalidationMessage ReorderBufferExecuteInvalidations ReorderBufferProcessTXN ReorderBufferCommit DecodeCommit DecodeXactOp LogicalDecodingProcessRecord XLogSendLogical WalSndLoop StartLogicalReplication exec_replication_command PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main __libc_start_main _start +0.77% postgres postgres[.] LocalExecuteInvalidationMessage 0.47% postgres postgres[.] hash_bytes_uint32 0.42% postgres postgres[.] hash_search_with_hash_value 0.37% postgres postgres[.] RelfilenodeMapInvalidateCallback 0.32% postgres postgres[.] CatCacheInvalidate 0.23% postgres postgres[.] PlanCacheRelCallback 0.23% postgres postgres[.] ReorderBufferExecuteInvalidations # after (3) TRUNCATE decode + invalidation = 1s or less 2020-10-01 17:09:43 BEGIN 537 2020-10-01 17:09:43 table nsp_001.tbl_001, nsp_001.part_0001, nsp_001.part_0002 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags) 2020-10-01 17:09:43 COMMIT 537 (at 2020-10-01 17:09:43.192989+09) Samples: 337 of event 'cpu-clock:uhH', Event count (approx.): 8425 Overhead Command Shared Object Symbol - 47.48% postgres postgres[.] hash_seq_search hash_seq_search RelfilenodeMapInvalidateCallback LocalExecuteInvalidationMessage ReorderBufferExecuteInvalidations ReorderBufferProcessTXN ReorderBufferCommit DecodeCommit DecodeXactOp LogicalDecodingProcessRecord XLogSendLogical WalSndLoop StartLogicalReplication exec_replication_command PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main __libc_start_main _start +3.26% postgres postgres[.] heap_hot_search_buffer +2.67% postgres postgres[.] pg_comp_crc32c_sse42 +2.08% postgres libc-2.17.so[.] __memcpy_ssse3_back +2.08% postgres postgres[.] DecodeXLogRecord +2.08% postgres postgres[.] hash_search_with_hash_value +1.48% postgres libpthread-2.17.so [.] __libc_recv +1.19% postgres libc-2.17.so[.] __memset_sse2 +1.19% postgres libpthread-2.17.so [.] __errno_location +1.19% postgres postgres[.] LocalExecuteInvalidationMessage +1.19% postgres postgres [.] ReadPageInternal +1.19% postgres postgres[.] XLogReadRecord +1.19% postgres postgres[.] socket_is_send_pending hash_seq_search Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Dilip, Amit, > > 5. Can you please once repeat the performance test done by Keisuke-San > > to see if you have similar observations? Additionally, see if you are > > also seeing the inconsistency related to the Truncate message reported > > by him and if so why? > > > > Okay, I will set up and do this test early next week. Keisuke-San, > can you send me your complete test script? Yes, I've attached a test script(test_pg_recvlogical.sh) Sorry, the issue with TRUNCATE not outputting was due to a build miss on my part. Even before the patch, TRUNCATE decodes and outputting correctly. So, please check the performance only. I have tested it again and will share the results with you. Also, the argument of palloc was still MemoryContextAlloc, which prevented me from applying the patch, so I've only fixed that part. # test script Please set PGHOME and CLUSTER_PUB before run. sh test_pg_recvlogical.sh # perf command perf record --call-graph dwarf -p [walsender pid] perf report -i perf.data --no-children # before patch decode + invalidation = 222s 2020-10-02 14:55:50 BEGIN 509 2020-10-02 14:59:42 table nsp_001.tbl_001, nsp_001.part_0001 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags) 2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09) Samples: 229K of event 'cpu-clock:uhH', Event count (approx.): 5734725 Overhead Command Shared Object Symbol - 96.07% postgres postgres [.] hash_seq_search hash_seq_search - RelfilenodeMapInvalidateCallback - LocalExecuteInvalidationMessage ReorderBufferExecuteInvalidations ReorderBufferProcessTXN ReorderBufferCommit DecodeCommit DecodeXactOp LogicalDecodingProcessRecord XLogSendLogical WalSndLoop StartLogicalReplication exec_replication_command PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main __libc_start_main _start 0.74% postgres postgres [.] LocalExecuteInvalidationMessage 0.47% postgres postgres [.] hash_bytes_uint32 0.46% postgres postgres [.] hash_search_with_hash_value 0.37% postgres postgres [.] RelfilenodeMapInvalidateCallback 0.31% postgres postgres [.] CatCacheInvalidate 0.22% postgres postgres [.] uint32_hash 0.21% postgres postgres [.] PlanCacheRelCallback 0.20% postgres postgres [.] hash_seq_init 0.17% postgres postgres [.] ReorderBufferExecuteInvalidations # after patch decode + invalidation = 1s or less 2020-10-02 15:20:08 BEGIN 509 2020-10-02 15:20:08 table nsp_001.tbl_001, nsp_001.part_0001 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags) 2020-10-02 15:20:08 COMMIT 509 (at 2020-10-02 15:20:08.704503+09) Samples: 310 of event 'cpu-clock:uhH', Event count (approx.): 7750 Overhead Command Shared Object Symbol - 49.35% postgres postgres[.] hash_seq_search hash_seq_search RelfilenodeMapInvalidateCallback LocalExecuteInvalidationMessage ReorderBufferExecuteInvalidations ReorderBufferProcessTXN ReorderBufferCommit DecodeCommit DecodeXactOp LogicalDecodingProcessRecord XLogSendLogical WalSndLoop StartLogicalReplication exec_replication_command PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main __libc_start_main _start 2.58% postgres libc-2.17.so[.] __memset_sse2 2.58% postgres libpthread-2.17.so [.] __libc_recv 1.61% postgres libc-2.17.so[.] __memcpy_ssse3_back 1.61% postgres libpthread-2.17.so [.] __errno_location 1.61% postgres postgres[.] AllocSetAlloc 1.61% postgres postgres[.] DecodeXLogRecord 1.29% postgres postgres[.] AllocSetFree 1.29% postgres postgres[.] hash_bytes_uint32 0.97% postgres [vdso] [.] __vdso_gettimeofday 0.97% postgres postgres [.] LocalExecuteInvalidationMessage Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com #!/bin/bash PGHOME="/home/postgres/pgsql/master_20201002_1" CLUSTER_PUB="/home/postgres/pgsql/master_20201002_1/data_pub" ${PGHOME}/bin/pg_ctl -D ${CLUSTER_PUB} stop rm -rf ${CLUSTER_PUB} ${PGHOME}/bin/initdb -D ${CLUSTER_PUB} --no-locale -E UTF8 -k ## Configuration echo "shared_buffers = 1GB" >> ${CLUSTER_PUB}/postgresql.conf echo "max_locks_per_transaction = 1" >> ${CLUSTER_PUB}/postgresql.conf echo "synchronous_commit = off" >> ${CLUSTER_PUB}/postgresql.conf echo "archive_mode = off" >> ${CLUSTER_PUB}/postgresql.conf echo "fsync = off"
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Dilip, > I could not see this issue even without the patch, it is taking less > than 1s even without the patch. See below results > > 2020-10-08 13:00:49 BEGIN 509 > 2020-10-08 13:00:49 table nsp_001.part_0001: INSERT:... > 2020-10-08 13:00:49 COMMIT 509 (at 2020-10-08 13:00:48.741986+05:30) > > Am I missing something? Thanks for running the tests. It is the TRUNCATE decoding that takes time. INSERT decoding is fast, even before the patch is applied. 2020-10-02 14:55:48 BEGIN 508 2020-10-02 14:55:48 table nsp_001.part_0001: INSERT ... 2020-10-02 14:55:49 COMMIT 508 (at 2020-10-02 14:55:48.744019+09) However, TRUNCATE decode is slow and take 222s in my environment. 2020-10-02 14:55:50 BEGIN 509 2020-10-02 14:59:42 table nsp_001.tbl_001 ... ns p_001.part_1000: TRUNCATE: (no-flags) 2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09) This script will wait 10 seconds after INSERT exits before executing TRUNCATE, please wait for it to run. When TRUNCATE completes, the walsender process should be at 100% CPU. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi Amit, Dilip, I am glad to solve this problem, Thanks! Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Separate GUC for replication origins
Hi Euler, As you may have already read, separating the max_replication_slots parameter is also discussed in the following thread. https://www.postgresql.org/message-id/flat/CAA4eK1KF4MbyWmPj9ctNo8Fiei%3DK91RGYtzV5ELeCvR%3D_rqNgg%40mail.gmail.com#3012c5c18e40e 21ac553b42d53249e42 I agree with separating the parameters. I think we need to discuss the names of the parameter. Parameter names were also discussed in the above thread. It was suggested to name them 'max_replication_origin_states' or 'max_tracked_replication_origins'. Personally, I find 'max_replication_origin_states' easier to understand, although it is a long name. Best Regards, Keisuke Kuroda NTT Comware
Re: [PATCH] Improve code coverage of network address functions
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Thank you for your patch! I have confirmed that the coverage improves to the expected value(69.7%->83.0%) source: commit 75eb9766ec201b62f264554019757716093e2a2f(HEAD) ## add with-openssl option for ssltest ./configure --enable-coverage --enable-tap-tests --with-llvm --with-openssl CFLAGS=-O0 make check-world PG_TEST_EXTRA='ssl' -j 4 > /tmp/regress.log make coverage-html The new status of this patch is: Ready for Committer
Re: [PATCH] Improve code coverage of network address functions
> The following review has been posted through the commitfest application: > make installcheck-world: tested, failed > Implements feature: tested, failed > Spec compliant: tested, failed > Documentation:tested, failed Sorry, I thought I checked with the commitfest App, but it doesn't seem to be reflecting correctly. The correct version is as follows. make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed
Re: [PATCH] Improve code coverage of network address functions
I could not send it to the mailing list, so I'm resending it. ___ Hi Aleksander, I have tested your patch. I have confirmed that the coverage improves to the expected value(69.8%->80.1%) Your patch looks good to me. ## test and make coverage source: commit 9a45a89c38f3257b13e09edf382e32fa28b918c2 (HEAD) ./configure --enable-coverage --enable-tap-tests --with-llvm CFLAGS=-O0 make check-world make coverage-html ## Proposal to add a test for the set_masklen function I think we could add the following test to further improve the coverage. Adding the following cases to the set_masklen() test would further improve coverage. * netmask = -1 * netmask > maxvalue(33 when ipv4) ``` -- check to treat netmask as maximum value when -1 SELECT set_masklen(cidr(text(c)), -1) FROM INET_TBL; set_masklen 192.168.1.0/32 192.168.1.0/32 192.168.1.0/32 192.168.1.0/32 192.168.1.0/32 192.168.1.0/32 10.0.0.0/32 10.0.0.0/32 10.1.2.3/32 10.1.2.0/32 10.1.0.0/32 10.0.0.0/32 10.0.0.0/32 10.0.0.0/32 10:23::f1/128 10:23::8000/128 :::1.2.3.4/128 (17 rows) -- check that rejects invalid netmask SELECT set_masklen(inet(text(i)), 33) FROM INET_TBL; ERROR: invalid mask length: 33 SELECT set_masklen(cidr(text(c)), 33) FROM INET_TBL; ERROR: invalid mask length: 33 ``` This would increase coverage from 80.1% to 80.5%. The improvement value is small, but it would be worth adding. What do you think? As a side note, the macaddr/macaddr8 type processing in the convert_network_to_scalar() does not seem to be testing. This is related to the macaddr/macaddr8 type histograms and do not appear to work in a simple test. This should be considered for another time. Best Regards, Keisuke Kuroda NTT Comware