Re: ECPG: proposal for new DECLARE STATEMENT
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Recently I have been doing some work on ecpg. So I review this patch. No problem was found. The new status of this patch is: Ready for Committer
Re: WIP: Data at rest encryption
On Wed, 04 Sep 2019 00:56:15 +0800 Alvaro Herrera <mailto:alvhe...@2ndquadrant.com> wrote On 2019-Aug-02, Shawn Wang wrote: > Hi Antonin, > It is very glad to see the new patch. I used the public patches a long time > ago. > I did some tests like the stream replication, much data running, temporary > files encryption. > I found that there is an issue in the src/backend/storage/file/encryption.c. > You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef > USE_ASSERT_CHECKING. > There is some problem to merge your patches to the latest kernel in the > pg_ctl.c. Is a new, fixed version going to be posted soon? It's been a while. Also, apologies if this has been asked before, but: how does this patch relate to the stuff being discussed in https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ? Hi Álvaro, Thank you for a reply. I mainly said that the issues in the src/backend/storage/file/encryption.c. If somebody want to use these patches, I think Antonin need to fix it. It does not relate to the stuff being discussed in TDE. As I know, some company use these patches to encrypt data, even if these issues don't matter. Regards, -- Shawn Wang
Re: WIP: Data at rest encryption
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: tested, failed Documentation:not tested Hi Antonin, It is very glad to see the new patch. I used the public patches a long time ago. I did some tests like the stream replication, much data running, temporary files encryption. I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef USE_ASSERT_CHECKING. There is some problem to merge your patches to the latest kernel in the pg_ctl.c. Regards, -- Shawn Wang
Re: Problem with default partition pruning
Hi Hosoya-san, I am sorry for so late to reply to you. I merged the patches into master(commit: 44460d7017cde005d7a2e246db0b32375bfec15d). I tested the case I used in the previous patches and didn't find any issues. Now I find that you are rethinking some of the details. I will continue to pay attention to this and will follow up and feedback in time. Regards, -- Shawn Wang On Thu, 27 Jun 2019 10:34:13 +0800 yuzuko wrote Hello, On Tue, Jun 25, 2019 at 1:45 PM yuzuko <mailto:yuzukohos...@gmail.com> wrote: > > Hello Shawn, Alvaro, > > Thank you for testing patches and comments. > Yes, there are two patches: > (1) v4_default_partition_pruning.patch fixes problems with default > partition pruning > and (2) v3_ignore_contradictory_where_clauses_at_partprune_step.patch > determines > if a given clause contradicts a sub-partitioned table's partition constraint. > I'll post two patches together next time. > > Anyway, I'll rebase two patches to apply on master and fix space. > Attach the latest patches discussed in this thread. I rebased the second patch (v5_ignore_contradictory_where_clauses_at_partprune_step.patch) on the current master. Could you please check them again? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Hi hackers, By arrange, I will complete the modification of the front-end tool to support TDE. Now I have completed the modification of the pg_waldump, pg_resetwal, and pg_rewind tools. My design: 1. Add two options, -D and -c, to the front-end tools. You can use -c to get a password of the user to generate kek; use the -D option to get cluster encryption, walkey, and relkey. 2. pg_waldump adds wal decryption function 3. pg_rewind adds wal decryption function 4. pg_resetwal adds wal encryption Regards, -- Shawn Wang Masahiko Sawada 于2019年10月31日周四 下午10:25写道: > On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter > wrote: > > > > -Original Message- > > From: Masahiko Sawada Sent: Thursday, 15 August > 2019 7:10 PM > > > > > BTW I've created PoC patch for cluster encryption feature. Attached > patch set has done some items of TODO list and some of them can be used > even for finer granularity encryption. Anyway, the implemented components > are followings: > > > > Hello Sawada-san, > > > > I guess your original patch code may be getting a bit out-dated by the > ongoing TDE discussions, but I have done some code review of it anyway. > > > > Hopefully a few comments below can still be of use going forward: > > > > --- > > > > REVIEW COMMENTS > > > > * src/backend/storage/encryption/enc_cipher.c – For functions > EncryptionCipherValue/String maybe should log warnings for unexpected > values instead of silently assigning to default 0/”off”. > > > > * src/backend/storage/encryption/enc_cipher.c – For function > EncryptionCipherString, purpose of returning ”unknown” if unclear because > that will map back to “off” again anyway via EncryptionCipherValue. Why not > just return "off" (with warning logged). > > > > * src/include/storage/enc_common.h – Typo in comment: "Encrypton". > > > > * src/include/storage/encryption.h - The macro DataEncryptionEnabled may > be better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0 > > > > * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will > report error if USE_OPENSSL is not defined. The check seems premature > because it would fail even if the user is not using encryption. Shouldn't > the lack of openssl be OK when user is not using TDE at all (i.e. when > encryption is "none")? > > > > * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr > suggest better to check if (bootstrap_data_encryption_cipher == > TDE_ENCRYPTION_OFF) using enum instead of the magic number 0. > > > > * src/backend/storage/encryption/kmgr.c - The function > run_cluster_passphrase_command function seems mostly a clone of an existing > run_ssl_passphrase_command function. Is it possible to refactor to share > the common code? > > > > * src/backend/storage/encryption/kmgr.c - The function > derive_encryption_key declares a char key_len. Why char? It seems int > everywhere else. > > > > * src/backend/bootstrap/bootstrap.c - Suggest better if variable > declaration bootstrap_data_encryption_cipher = 0 uses enum > TDE_ENCRYPTION_OFF instead of magic number 0 > > > > * src/backend/utils/misc/guc.c - It looks like the default value for GUC > variable data_encryption_cipher is AES128. Wouldn't "off" be the more > appropriate default value? Otherwise it seems inconsistent with the logic > of initdb (which insists that the -e option is mandatory if you wanted any > encryption). > > > > * src/backend/utils/misc/guc.c - There is a missing entry in the > config_group_names[]. The patch changed the config_group[] in guc_tables.h, > so I think there needs to be a matching item in the config_group_names. > > > > * src/bin/initdb/initdb.c - The function check_encryption_cipher would > disallow an encryption value of "none". Although maybe it is not very > useful to say -e none, it does seem inconsistent to reject it, given that > "none" was a valid value for the GUC variable data_encryption_cipher. > > > > * contrib/bloom/blinsert.c - In function btbuildempty the arguments for > PageEncryptionInPlace seem in the wrong order (forknum should be 2nd). > > > > * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets > the arguments for PageEncryptionInPlace seem in the wrong order (forknum > should be 2nd). > > > > * src/backend/access/spgist/spginsert.c - In function spgbuildempty the > arguments for PageEncryptionInPlace seem in the wrong order (forknum should > be 2nd). This error looks repeated 3X. > > > > * in multiple files - The encryption enums have equivalent strings > (
Re: [Proposal] Global temporary tables
wenjing 于2021年4月15日周四 下午3:26写道: > HI Pavel > > I added user documentation. > Please give me feedback. > > > Wenjing > > Hi, Wenjing, I have checked your documentation section and fixed a spelling mistake, adjusted some sentences for you. All the modified content is in the new patch, and please check it. Regards Shawn 0002-gtt-v47-doc.patch Description: Binary data
Re: Is it worth pushing conditions to sublink/subplan?
I tested it the way you said and increased the number of sub-tables. I created a hash partition table of 1000 sub-tables. Test according to your first SQL, the optimizer cuts the unnecessary sub-tables well. You can see the plan: postgres=# explain analyze postgres-# select temp.p1 from postgres-# ( postgres(# select p1,p2 from test1.test1hashtable x where x.p1 = '1' postgres(# union all postgres(# (values('1','1')) postgres(# ) temp, postgres-# test1.test1hashtable y postgres-# where y.p2 = temp.p2 and y.p1 = '1' and y.p1='1'; QUERY PLAN - Nested Loop (cost=0.00..25.55 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1) Join Filter: (x.p2 = y.p2) -> Seq Scan on test1hashtable826 y (cost=0.00..12.75 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1) Filter: (p1 = '1'::text) -> Append (cost=0.00..12.78 rows=2 width=64) (never executed) -> Seq Scan on test1hashtable826 x (cost=0.00..12.75 rows=1 width=64) (never executed) Filter: (p1 = '1'::text) -> Result (cost=0.00..0.01 rows=1 width=64) (never executed) Planning Time: 0.158 ms Execution Time: 0.022 ms (10 rows) But when the second one runs, the planning time reaches 13.942ms. The plan: postgres=# explain analyze postgres-# select postgres-# y.p1, postgres-# (Select x.p2 from test1.test1hashtable x where y.p1 =x.p1 and y.p2=x.p2) as b postgres-# from test1.test1hashtable y where p1 = '1' and p2 = '1'; QUERY PLAN -- Seq Scan on test1hashtable826 y (cost=0.00..13318.30 rows=1 width=64) (actual time=0.004..0.047 rows=0 loops=1) Filter: ((p1 = '1'::text) AND (p2 = '1'::text)) SubPlan 1 -> Append (cost=0.00..13305.00 rows=1000 width=32) (never executed) -> Seq Scan on test1hashtable1 x_1 (cost=0.00..13.30 rows=1 width=32) (never executed) Filter: ((y.p1 = p1) AND (y.p2 = p2)) -> Seq Scan on test1hashtable1000 x_1000 (cost=0.00..13.30 rows=1 width=32) (never executed) Filter: ((y.p1 = p1) AND (y.p2 = p2)) Planning Time: 13.942 ms Execution Time: 4.899 ms (2006 rows) This is a very worthwhile thing to do. In a relatively large business system, a large number of partition tables and high concurrency are often used. If the planning time is too long, this will greatly affect the business. regards, Shawn. Wenjing 于2021年8月17日周二 上午10:31写道: > > > 2021年8月16日 17:15,Wenjing 写道: > > Hi Hackers, > > Recently, a issue has been bothering me, This is about conditional > push-down in SQL. > I use cases from regression testing as an example. > I found that the conditions (B =1) can be pushed down into the > subquery, However, it cannot be pushed down to sublink/subplan. > If a sublink/subplan clause contains a partition table, it can be useful > to get the conditions for pruning. > So, is it worth pushing conditions to sublink/subplan? > Anybody have any ideas? > > > regards, > Wenjing > > > example: > create table p (a int, b int, c int) partition by list (a); > create table p1 partition of p for values in (1); > create table p2 partition of p for values in (2); > create table q (a int, b int, c int) partition by list (a); > create table q1 partition of q for values in (1) partition by list (b); > create table q11 partition of q1 for values in (1) partition by list (c); > create table q111 partition of q11 for values in (1); > create table q2 partition of q for values in (2) partition by list (b); > create table q21 partition of q2 for values in (1); > create table q22 partition of q2 for values in (2); > insert into q22 values (2, 2, 3); > > Sorry, I messed up the structure of the table. > It is should be: > create table ab (a int not null, b int not null) partition by list (a); > create table ab_a2 partition of ab for values in(2) partition by list (b); > create table ab_a2_b1 partition of ab_a2 for values in (1); > create table ab_a2_b2 partition of ab_a2 for values in (2); > create table ab_a2_b3 partition of ab_a2 for values in (3); > create table ab_a1 partition of ab for values in(1) partition by list (b); > create table ab_a1_b1 partition of ab_a1 for values in (1); > create table ab_a1_b2 partition of ab_a1 for values in (2); > create table ab_a1_b3 partition of ab_a1 for values in (3); > create table ab_a3 partition of ab for values in(3) partition by list (b); > create table ab_a3_b1 partition of ab_a3 for values in (1); > create table ab_a3_b2 partition of ab_a3 for values in (2); > create table ab_a3_b3 partition of ab_a3 for values in (3); > > > > > postgres-# explain (costs off) > postgres-# select temp.b from > postgres-# ( > p
Trim the heap free memory
Hi hackers, Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long connections where business queries are executed over extended periods, the distribution of memory can become extremely complex. Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if memory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to high system memory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory (OOM) errors. To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the existing pg_log_backend_memory_contexts function. This function triggers the specified process to execute the malloc_trim operation by sending signals, thereby releasing as much unreturned memory to the operating system as possible. This not only helps to optimize memory usage but can also significantly enhance system performance under memory pressure. Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect: CREATE OR REPLACE FUNCTION public.partition_create(schemaname character > varying, numberofpartition integer) > RETURNS integer > LANGUAGE plpgsql > AS $function$ > declare > currentTableId integer; > currentSchemaName varchar(100); > currentTableName varchar(100); > begin > execute 'create schema ' || schemaname; > execute 'create table ' || schemaname || '.' || schemaname || 'hashtable > (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9 > name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);'; > currentTableId := 1; > loop > currentTableName := schemaname || '.' || schemaname || 'hashtable' || > ltrim(currentTableId::varchar(10)); > execute 'create table ' || currentTableName || ' PARTITION OF ' || > schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS > ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')'; > currentTableId := currentTableId + 1; > if (currentTableId > numberofpartition) then exit; end if; > end loop; > return currentTableId - 1; > END $function$; > > select public.partition_create('test3', 5000); > select public.partition_create('test4', 5000); > select count(*) from test4.test4hashtable a, test3.test3hashtable b where > a.p1=b.p1; You are now about to see the memory size of the process executing the query. > postgres 68673 1.2 0.0 610456 124768 ?Ss 08:25 0:01 > postgres: postgres postgres [local] idle > Size: 89600 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > Rss: 51332 kB > Pss: 51332 kB 02b65000-082e5000 rw-p 00:00 0 > [heap] > After use pg_trim_backend_heap_free_memory, you will see: > postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid()); > 2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of PID > 68673 > pg_trim_backend_heap_free_memory > -- > t > (1 row) > 02b65000-082e5000 rw-p 00:00 0 > [heap] > Size: 89600 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > Rss:4888 kB > Pss:4888 kB postgres 68673 1.2 0.0 610456 75244 ?Ss 08:26 0:01 > postgres: postgres postgres [local] idle > Looking forward to your feedback, Regards, -- Shawn Wang Now trimheapfreemeory.patch Description: Binary data
Re: Trim the heap free memory
Thank you Rafia. Here is a v2 patch. Rafia Sabih 于2024年8月23日周五 18:30写道: > > > On Fri, 23 Aug 2024 at 10:54, shawn wang wrote: > >> Hi hackers, >> >> Currently, all processes in PostgreSQL actually use malloc to allocate >> and free memory. In the case of long connections where business queries are >> executed over extended periods, the distribution of memory can become >> extremely complex. >> >> Under certain circumstances, a common issue in memory usage due to the >> caching strategy of malloc may arise: even if memory is released through >> the free function, it may not be returned to the OS in a timely manner. >> This can lead to high system memory usage, affecting performance and the >> operation of other applications, and may even result in Out-Of-Memory (OOM) >> errors. >> >> To address this issue, I have developed a new function called >> pg_trim_backend_heap_free_memory, based on the existing >> pg_log_backend_memory_contexts function. This function triggers the >> specified process to execute the malloc_trim operation by sending >> signals, thereby releasing as much unreturned memory to the operating >> system as possible. This not only helps to optimize memory usage but can >> also significantly enhance system performance under memory pressure. >> >> Here is an example of using the pg_trim_backend_heap_free_memory >> function to demonstrate its effect: >> >> CREATE OR REPLACE FUNCTION public.partition_create(schemaname character >>> varying, numberofpartition integer) >>> RETURNS integer >>> LANGUAGE plpgsql >>> AS $function$ >>> declare >>> currentTableId integer; >>> currentSchemaName varchar(100); >>> currentTableName varchar(100); >>> begin >>> execute 'create schema ' || schemaname; >>> execute 'create table ' || schemaname || '.' || schemaname || 'hashtable >>> (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9 >>> name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);'; >>> currentTableId := 1; >>> loop >>> currentTableName := schemaname || '.' || schemaname || 'hashtable' || >>> ltrim(currentTableId::varchar(10)); >>> execute 'create table ' || currentTableName || ' PARTITION OF ' || >>> schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS >>> ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')'; >>> currentTableId := currentTableId + 1; >>> if (currentTableId > numberofpartition) then exit; end if; >>> end loop; >>> return currentTableId - 1; >>> END $function$; >>> >>> select public.partition_create('test3', 5000); >>> select public.partition_create('test4', 5000); >>> select count(*) from test4.test4hashtable a, test3.test3hashtable b >>> where a.p1=b.p1; >> >> You are now about to see the memory size of the process executing the >> query. >> >>> postgres 68673 1.2 0.0 610456 124768 ?Ss 08:25 0:01 >>> postgres: postgres postgres [local] idle >>> Size: 89600 kB >>> KernelPageSize:4 kB >>> MMUPageSize: 4 kB >>> Rss: 51332 kB >>> Pss: 51332 kB >> >> 02b65000-082e5000 rw-p 00:00 0 >>> [heap] >>> >> >> >> After use pg_trim_backend_heap_free_memory, you will see: >> >>> postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid()); >>> 2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of >>> PID 68673 >>> pg_trim_backend_heap_free_memory >>> -- >>> t >>> (1 row) >>> 02b65000-082e5000 rw-p 00:00 0 >>> [heap] >>> Size: 89600 kB >>> KernelPageSize:4 kB >>> MMUPageSize: 4 kB >>> Rss:4888 kB >>> Pss:4888 kB >> >> postgres 68673 1.2 0.0 610456 75244 ?Ss 08:26 0:01 >>> postgres: postgres postgres [local] idle >>> >> >> Looking forward to your feedback, >> >> Regards, >> >> -- >> Shawn Wang >> >> >> Now >> > Liked the idea. Unfortunately, at the moment it is giving compilation > error -- > > make[4]: *** No rule to make target `memtrim.o', needed by > `objfiles.txt'. Stop. > -- > Regards, > Rafia Sabih > v2-0001-Trim-Heap-Free-Memory.patch Description: Binary data
Re: Trim the heap free memory
Hi Ashutosh, thank you for your response. Firstly, the purpose of caching memory in malloc is for performance, so when we execute malloc_trim(), it will affect the efficiency of memory usage in the subsequent operation. Secondly, the function of malloc_trim() is to lock and traverse the bins, then execute madvise on the memory that can be released. When there is a lot of memory in the bins, the traversal time will also increase. I once placed malloc_trim() to execute at the end of each query, which resulted in a 20% performance drop. Therefore, I use it as such a function. The new v2 patch has included the omitted code. Ashutosh Bapat 于2024年8月23日周五 20:02写道: > Hi Shawn, > > > On Fri, Aug 23, 2024 at 2:24 PM shawn wang > wrote: > > > > Hi hackers, > > > > Currently, all processes in PostgreSQL actually use malloc to allocate > and free memory. In the case of long connections where business queries are > executed over extended periods, the distribution of memory can become > extremely complex. > > > > Under certain circumstances, a common issue in memory usage due to the > caching strategy of malloc may arise: even if memory is released through > the free function, it may not be returned to the OS in a timely manner. > This can lead to high system memory usage, affecting performance and the > operation of other applications, and may even result in Out-Of-Memory (OOM) > errors. > > > > To address this issue, I have developed a new function called > pg_trim_backend_heap_free_memory, based on the existing > pg_log_backend_memory_contexts function. This function triggers the > specified process to execute the malloc_trim operation by sending signals, > thereby releasing as much unreturned memory to the operating system as > possible. This not only helps to optimize memory usage but can also > significantly enhance system performance under memory pressure. > > > > Here is an example of using the pg_trim_backend_heap_free_memory > function to demonstrate its effect: > >> > >> CREATE OR REPLACE FUNCTION public.partition_create(schemaname character > varying, numberofpartition integer) > >> RETURNS integer > >> LANGUAGE plpgsql > >> AS $function$ > >> declare > >> currentTableId integer; > >> currentSchemaName varchar(100); > >> currentTableName varchar(100); > >> begin > >> execute 'create schema ' || schemaname; > >> execute 'create table ' || schemaname || '.' || schemaname || > 'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 > text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY > HASH(p1);'; > >> currentTableId := 1; > >> loop > >> currentTableName := schemaname || '.' || schemaname || 'hashtable' || > ltrim(currentTableId::varchar(10)); > >> execute 'create table ' || currentTableName || ' PARTITION OF ' || > schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS > ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')'; > >> currentTableId := currentTableId + 1; > >> if (currentTableId > numberofpartition) then exit; end if; > >> end loop; > >> return currentTableId - 1; > >> END $function$; > >> > >> select public.partition_create('test3', 5000); > >> select public.partition_create('test4', 5000); > >> select count(*) from test4.test4hashtable a, test3.test3hashtable b > where a.p1=b.p1; > > > > You are now about to see the memory size of the process executing the > query. > >> > >> postgres 68673 1.2 0.0 610456 124768 ?Ss 08:25 0:01 > postgres: postgres postgres [local] idle > >> Size: 89600 kB > >> KernelPageSize:4 kB > >> MMUPageSize: 4 kB > >> Rss: 51332 kB > >> Pss: 51332 kB > >> > >> 02b65000-082e5000 rw-p 00:00 0 > [heap] > > > > > > > > After use pg_trim_backend_heap_free_memory, you will see: > >> > >> postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid()); > >> 2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of > PID 68673 > >> pg_trim_backend_heap_free_memory > >> -- > >> t > >> (1 row) > >> 02b65000-082e5000 rw-p 00:00 0 > [heap] > >> Size: 89600 kB > >> KernelPageSize:4 kB > >> MMUPageSize: 4 kB > >> Rss:4888 kB > >> Pss:4888 kB > >> > >> postgres 68673 1.2 0.0 610456 75244 ?Ss 08:26 0:01 > postgres: postgres postgres [local] idle > > > > > > Looking forward to your feedback, > Looks useful. > > How much time does malloc_trim() take to finish? Does it affect the > current database activity in that backend? It may be good to see > effect of this function by firing the function on random backends > while the query is running through pgbench. > > In the patch I don't see definitions of > ProcessTrimHeapFreeMemoryInterrupt() and > HandleTrimHeapFreeMemoryInterrupt(). Am I missing something? > > -- > Best Wishes, > Ashutosh Bapat >
Re: Trim the heap free memory
Hi Ashutosh, Ashutosh Bapat 于2024年8月26日周一 19:05写道: > Hi Shawn, > It will be good to document usage of this function. Please add > document changes in your patch. We need to document the impact of this > function so that users can judiciously decide whether or not to use > this function and under what conditions. Also they would know what to > expect when they use this function. I have already incorporated the usage of this function into the new patch. Currently, there is no memory information that can be extremely accurate to reflect whether a trim operation should be performed. Here are two conditions that can be used as references: 1. Check the difference between the process's memory usage (for example, the top command, due to the relationship with shared memory, it is necessary to subtract SHR from RES) and the statistics of the memory context. If the difference is very large, this function should be used to release memory; 2. Execute malloc_stats(). If the system bytes are greater than the in-use bytes, this indicates that this function can be used to release memory. > > Running it after a query finishes is one thing but that can't be > guaranteed because of the asynchronous nature of signal handlers. > malloc_trim() may be called while a query is being executed. We need > to assess that impact as well. > > Can you please share some numbers - TPS, latency etc. with and without > this function invoked during a benchmark run? > I have placed malloc_trim() at the end of the exec_simple_query function, so that malloc_trim() is executed once for each SQL statement executed. I used pgbench to reproduce the performance impact, and the results are as follows. *Database preparation:* > create database testc; > create user t1; > alter database testc owner to t1; > ./pgbench testc -U t1 -i -s 100 > ./pgbench testc -U t1 -S -c 100 -j 100 -T 600 *Without Trim*: > $./pgbench testc -U t1 -S -c 100 -j 100 -T 600 > pgbench (18devel) > starting vacuum...end. > transaction type: > scaling factor: 100 > query mode: simple > number of clients: 100 > number of threads: 100 > maximum number of tries: 1 > duration: 600 s > number of transactions actually processed: 551984376 > number of failed transactions: 0 (0.000%) > latency average = 0.109 ms > initial connection time = 23.569 ms > tps = 920001.842189 (without initial connection time) *With Trim :* > $./pgbench testc -U t1 -S -c 100 -j 100 -T 600 > pgbench (18devel) > starting vacuum...end. > transaction type: > scaling factor: 100 > query mode: simple > number of clients: 100 > number of threads: 100 > maximum number of tries: 1 > duration: 600 s > number of transactions actually processed: 470690787 > number of failed transactions: 0 (0.000%) > latency average = 0.127 ms > initial connection time = 23.632 ms > tps = 784511.901558 (without initial connection time) From 6d286d506ba5ed2dff012537766e7874952413ac Mon Sep 17 00:00:00 2001 From: Shawn Wang Date: Wed, 28 Aug 2024 18:20:57 +0800 Subject: [PATCH] Trim the free heap Memory. ALL processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long connections where business queries are executed over extended periods, the distribution of memory can become extremely complex. Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if memory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to high system memory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory (OOM) errors. Examine the difference between the memory usage of a process (for example, using the top command, where due to shared memory, it is necessary to subtract SHR from RES) and the statistics of the memory context. If the difference is very large, or after executing malloc_stats(), if the system bytes are greater than the in-use bytes, this indicates that this process needs to release the free heap memory. --- doc/src/sgml/func.sgml | 22 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/postmaster/autovacuum.c | 4 ++ src/backend/postmaster/checkpointer.c| 4 ++ src/backend/postmaster/interrupt.c | 4 ++ src/backend/postmaster/pgarch.c | 4 ++ src/backend/postmaster/startup.c | 4 ++ src/backend/postmaster/walsummarizer.c | 4 ++ src/backend/storage/ipc/procsignal.c | 3 + src/backend/tcop/postgres.c | 3 + src/backend/utils/adt/mcxtfuncs.c| 54 +++ src/backend/utils/init/globals.c | 1 + src/backend/utils/mmgr/Makefile | 1 + src/backend/utils/mmgr/memtrim.c | 69 src
Re: Trim the heap free memory
Hi Rafia, I have made the necessary adjustment by replacing the inclusion of malloc.h with stdlib.h in the relevant codebase. This change should address the previous concerns regarding memory allocation functions. Could you please perform another round of testing to ensure that everything is functioning as expected with this modification? Thank you for your assistance. Best regards, Shawn Rafia Sabih 于2024年9月11日周三 18:25写道: > Unfortunately, I still see a compiling issue with this patch, > > memtrim.c:15:10: fatal error: 'malloc.h' file not found > #include > ^~ > 1 error generated. > > On Wed, 28 Aug 2024 at 12:54, shawn wang wrote: > >> Hi Ashutosh, >> >> Ashutosh Bapat 于2024年8月26日周一 19:05写道: >> >>> Hi Shawn, >>> It will be good to document usage of this function. Please add >>> document changes in your patch. We need to document the impact of this >>> function so that users can judiciously decide whether or not to use >>> this function and under what conditions. Also they would know what to >>> expect when they use this function. >> >> >> I have already incorporated the usage of this function into the new patch. >> >> >> Currently, there is no memory information that can be extremely accurate >> to >> reflect whether a trim operation should be performed. Here are two >> conditions >> that can be used as references: >> 1. Check the difference between the process's memory usage (for example, >> the top command, due to the relationship with shared memory, it is >> necessary >> to subtract SHR from RES) and the statistics of the memory context. If the >> difference is very large, this function should be used to release memory; >> 2. Execute malloc_stats(). If the system bytes are greater than the >> in-use bytes, this indicates that this function can be used to release >> memory. >> >>> >>> >> Running it after a query finishes is one thing but that can't be >>> guaranteed because of the asynchronous nature of signal handlers. >>> malloc_trim() may be called while a query is being executed. We need >>> to assess that impact as well. >>> >>> Can you please share some numbers - TPS, latency etc. with and without >>> this function invoked during a benchmark run? >>> >> >> I have placed malloc_trim() at the end of the exec_simple_query function, >> so that malloc_trim() is executed once for each SQL statement executed. I >> used pgbench to reproduce the performance impact, >> and the results are as follows. >> *Database preparation:* >> >>> create database testc; >>> create user t1; >>> alter database testc owner to t1; >>> ./pgbench testc -U t1 -i -s 100 >>> ./pgbench testc -U t1 -S -c 100 -j 100 -T 600 >> >> *Without Trim*: >> >>> $./pgbench testc -U t1 -S -c 100 -j 100 -T 600 >>> pgbench (18devel) >>> starting vacuum...end. >>> transaction type: >>> scaling factor: 100 >>> query mode: simple >>> number of clients: 100 >>> number of threads: 100 >>> maximum number of tries: 1 >>> duration: 600 s >>> number of transactions actually processed: 551984376 >>> number of failed transactions: 0 (0.000%) >>> latency average = 0.109 ms >>> initial connection time = 23.569 ms >>> tps = 920001.842189 (without initial connection time) >> >> *With Trim :* >> >>> $./pgbench testc -U t1 -S -c 100 -j 100 -T 600 >>> pgbench (18devel) >>> starting vacuum...end. >>> transaction type: >>> scaling factor: 100 >>> query mode: simple >>> number of clients: 100 >>> number of threads: 100 >>> maximum number of tries: 1 >>> duration: 600 s >>> number of transactions actually processed: 470690787 >>> number of failed transactions: 0 (0.000%) >>> latency average = 0.127 ms >>> initial connection time = 23.632 ms >>> tps = 784511.901558 (without initial connection time) >> >> > > -- > Regards, > Rafia Sabih > v4-0001-Trim-the-free-heap-Memory.patch Description: Binary data
Re: Trim the heap free memory
Thank you for your valuable suggestion. I have successfully registered my patch for the commitfest. However, upon integration, I encountered several errors during the testing phase. I am currently investigating the root causes of these issues and will work on providing the necessary fixes. If you have any further insights or recommendations, I would greatly appreciate your guidance. Thank you once again for your support. Best regards, Shawn David Rowley 于2024年9月12日周四 16:42写道: > On Thu, 12 Sept 2024 at 14:40, shawn wang wrote: > > Could you please perform another round of testing to ensure that > everything is functioning as expected with this modification? > > One way to get a few machines with various build systems testing this > is to register the patch on the commitfest app in [1]. You can then > see if the patch is passing the continuous integration tests in [2]. > One day soon the features of [2] should be combined with [1]. > > David > > [1] https://commitfest.postgresql.org/50/ > [2] http://cfbot.cputube.org/ > v5-0001-PATCH-Trim-the-free-heap-Memory.patch Description: Binary data
Re: [bug] Table not have typarray when created by single user mode
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested I verified and found no problems.
Re: Problem with default partition pruning
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hi Hosoya-san, I tested different types of key values, and multi-level partitioned tables, and found no problems. Only the SQL in the file of src/test/regress/results/partition_prune.out has a space that caused the regression test to fail. The new status of this patch is: Waiting on Author
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Masahiko Sawada 于2019年6月17日周一 下午8:30写道: > On Fri, Jun 14, 2019 at 7:41 AM Tomas Vondra > wrote: > > I personally find the idea of encrypting tablespaces rather strange. > > Tablespaces are meant to define hysical location for objects, but this > > would also use them to "mark" objects as encrypted or not. That just > > seems misguided and would make the life harder for many users. > > > > For example, what if I don't have any tablespaces (except for the > > default one), but I want to encrypt only some objects? Suddenly I have > > to create a tablespace, which will however cause various difficulties > > down the road (during pg_basebackup, etc.). > > I guess that we can have an encrypted tabelspace by default (e.g. > pg_default_enc). Or we encrypt per tables while having encryption keys > per tablespaces. > Hi Sawada-san, I do agree with it. > > On Mon, Jun 17, 2019 at 6:54 AM Tomas Vondra > wrote: > > > > On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote: > > >Greetings, > > > > > >* Joe Conway (m...@joeconway.com) wrote: > > >> On 6/16/19 9:45 AM, Bruce Momjian wrote: > > >> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote: > > >> >> In any case it doesn't address my first point, which is limiting > the > > >> >> volume encrypted with the same key. Another valid reason is you > might > > >> >> have data at varying sensitivity levels and prefer different keys > be > > >> >> used for each level. > > >> > > > >> > That seems quite complex. > > >> > > >> How? It is no more complex than encrypting at the tablespace level > > >> already gives you - in that case you get this property for free if you > > >> care to use it. > > > > > >Perhaps not surprising, but I'm definitely in agreement with Joe > > >regarding having multiple keys when possible and (reasonably) > > >straight-forward to do so. I also don't buy off on the OpenSSL > > >argument; their more severe issues certainly haven't been due to key > > >management issues such as what we're discussing here, so I don't think > > >the argument applies. > > > > > > > I'm not sure what exactly is the "OpenSSL argument" you're disagreeing > > with? IMHO Bruce is quite right that the risk of vulnerabilities grows > > with the complexity of the system (both due to implementation bugs and > > general design weaknesses). I don't think it's tied to the key > > management specifically, except that it's one of the parts that may > > contribute to the complexity. > > > > (It's often claimed that key management is one of the weakest points of > > current crypto systems - we have safe (a)symmetric algorithms, but safe > > handling of keys is an issue. I don't have data / papers supporting this > > claim, I kinda believe it.) > > > > Now, I'm not opposed to eventually implementing something more > > elaborate, but I also think just encrypting the whole cluster (not > > necessarily with a single key, but with one master key) would be enough > > for vast majority of users. Plus it's less error prone and easier to > > operate (backups, replicas, crash recovery, ...). > > > > But there's about 0% chance we'll get that in v1, of course, so we need > > s "minimum viable product" to build on anyway. > > > > I agree that we need minimum viable product first. But I'm not sure > it's true that the implementing the cluster-wide TDE first could be > the first step of per-tablespace/table TDE. > Yes, we could complete the per-tablespace/table TDE in version 13. And we could do cluster-wide TDE in the next version. But I remember you said there are so many keys to manage in the table-level. Will we add the table-level TDE in the first version? And I have two questions. 1. Will we add hooks to support replacing the encryption algorithms? 2. Will we add some encryption algorithm or use some in some libraries? Regards, -- Shwan Wang HIGHGO SOFTWARE > The purpose of cluster-wide TDE and table/tablespace TDE are slightly > different in terms of encryption target objects. The cluster-wide TDE > would be a good solution for users who want to encrypt everything > while the table/tabelspace TDE would help more severe use cases in > terms of both of security and performance. > > The cluster-wide TDE eventually encrypts SLRU data and all WAL > including non-user data related WAL while table/tablespace TDE doesn't > unless we develop such functionality. In addition, the cluster-wide > TDE also encrypts system catalogs but in table/tablespace TDE user > would be able to control that somewhat. That is, if we developed the > cluster-wide TDE first, when we develop table/tablespace TDE on top of > that we would need to change TDE so that table/tablespace TDE can > encrypt even non-user data related data while retaining its simple > user interface, which would rather make the feature complex, I'm > concerned. We can support them as different TDE features but I'm not > sure it's a good choice for users. > > From perspective of cryptographic, I think the fine grained TD
Re: Problem with default partition pruning
Hi Alvaro, Thank you for your reply. You can see that the mail start time is February 22. So I looked at the latest version at that time. I found that v11.2 was the newest branch at the time. So I tried to merge this patch into the code, and I found that everything worked. So I tested on this branch and got the results. You need to add the v4_default_partition_pruning.patch <https://www.postgresql.org/message-id/attachment/100463/v4_default_partition_pruning.patch> first, and then add the v3_ignore_contradictory_where_clauses_at_partprune_step.patch <https://www.postgresql.org/message-id/attachment/100591/v3_ignore_contradictory_where_clauses_at_partprune_step.patch> . Otherwise, you will find some errors. I hope this helps you. Regards. -- Shawn Wang Alvaro Herrera 于2019年6月22日周六 上午4:03写道: > On 2019-Jun-17, Shawn Wang wrote: > > > I tested different types of key values, and multi-level partitioned > tables, and found no problems. > > Only the SQL in the file of src/test/regress/results/partition_prune.out > has a space that caused the regression test to fail. > > It's not clear to me what patch were you reviewing. The latest patch I > see in this thread, in [1], does not apply in any branches. As another > test, I tried to apply the patch on commit 489e431ba56b (before Tom's > partprune changes in mid May); if you use "patch -p1 > --ignore-whitespace" it is accepted, but the failure case proposed at > the start of the thread shows the same behavior (namely, that test1_def > is scanned when it is not needed): > > 55432 12devel 23506=# create table test1(id int, val text) partition by > range (id); > create table test1_1 partition of test1 for values from (0) to (100); > create table test1_2 partition of test1 for values from (150) to (200); > create table test1_def partition of test1 default; > explain select * from test1 where id > 0 and id < 30; > CREATE TABLE > Duración: 5,736 ms > CREATE TABLE > Duración: 5,622 ms > CREATE TABLE > Duración: 3,585 ms > CREATE TABLE > Duración: 3,828 ms >QUERY PLAN > ─ > Append (cost=0.00..58.16 rows=12 width=36) >-> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36) > Filter: ((id > 0) AND (id < 30)) >-> Seq Scan on test1_def (cost=0.00..29.05 rows=6 width=36) > Filter: ((id > 0) AND (id < 30)) > (5 filas) > > Duración: 2,465 ms > > > [1] https://postgr.es/m/00cf01d4eea7$afa43370$0eec9a50$@lab.ntt.co.jp > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > >
Re: Trim the heap free memory
Thank you very much for your response and suggestions. As you mentioned, the patch here is actually designed for glibc's ptmalloc2 andis not applicable to other platforms. I will consider supporting it only on the Linux platform in the future. In the memory management strategy of ptmalloc2, there is a certain amount of non-garbage-collected memory, which is closely related to the order and method of memory allocation and release. To reduce the performance overhead caused by frequent allocation and release of small blocks of memory, ptmalloc2 intentionally retains this part of the memory. The malloc_trim function locks, traverses memory blocks, and uses madvise to release this part of the memory, but this process may also have a negative impact on performance. In the process of exploring solutions, I also considered a variety of strategies, including scheduling malloc_trim to be executed at regular intervals or triggering malloc_trim after a specific number of free operations. However, we found that these methods are not optimal solutions. > We can see that out of about 43K test queries, 32K saved nothing > whatever, and in only four was more than a couple of meg saved. > That's pretty discouraging IMO. It might be useful to look closer > at the behavior of those top four though. I see them as I have previously encountered situations where the non-garbage-collected memory of wal_sender was approximately hundreds of megabytes or even exceeded 1GB, but I was unable to reproduce this situation using simple SQL. Therefore, I introduced an asynchronous processing function, hoping to manage memory more efficiently without affecting performance. In addition, I have considered the following optimization strategies: 1. Adjust the configuration of ptmalloc2 through the mallopt function to use mmap rather than sbrk for memory allocation. This can immediately return the memory to the operating system when it is released, but it may affect performance due to the higher overhead of mmap. 2. Use other memory allocators such as jemalloc or tcmalloc, and adjust relevant parameters to reduce the generation of non-garbage-collected memory. However, these allocators are designed for multi-threaded and may lead to increased memory usage per process. 3. Build a set of memory context (memory context) allocation functions based on mmap, delegating the responsibility of memory management entirely to the database level. Although this solution can effectively control memory allocation, it requires a large-scale engineering implementation. I look forward to further discussing these solutions with you and exploring the best memory management practices together. Best regards, Shawn Tom Lane 于2024年9月16日周一 03:16写道: > I wrote: > > The single test case you showed suggested that maybe we could > > usefully prod glibc to free memory at query completion, but we > > don't need all this interrupt infrastructure to do that. I think > > we could likely get 95% of the benefit with about a five-line > > patch. > > To try to quantify that a little, I wrote a very quick-n-dirty > patch to apply malloc_trim during finish_xact_command and log > the effects. (I am not asserting this is the best place to > call malloc_trim; it's just one plausible possibility.) Patch > attached, as well as statistics collected from a run of the > core regression tests followed by > > grep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq > -c >trim_savings.txt > > We can see that out of about 43K test queries, 32K saved nothing > whatever, and in only four was more than a couple of meg saved. > That's pretty discouraging IMO. It might be useful to look closer > at the behavior of those top four though. I see them as > > 2024-09-15 14:58:06.146 EDT [960138] LOG: malloc_trim saved 7228 kB > 2024-09-15 14:58:06.146 EDT [960138] STATEMENT: ALTER TABLE > delete_test_table ADD PRIMARY KEY (a,b,c,d); > > 2024-09-15 14:58:09.861 EDT [960949] LOG: malloc_trim saved 12488 kB > 2024-09-15 14:58:09.861 EDT [960949] STATEMENT: with recursive > search_graph(f, t, label, is_cycle, path) as ( > select *, false, array[row(g.f, g.t)] from graph g > union distinct > select g.*, row(g.f, g.t) = any(path), path || row(g.f, > g.t) > from graph g, search_graph sg > where g.f = sg.t and not is_cycle > ) > select * from search_graph; > > 2024-09-15 14:58:09.866 EDT [960949] LOG: malloc_trim saved 12488 kB > 2024-09-15 14:58:09.866 EDT [960949] STATEMENT: with recursive > search_graph(f, t, label) as ( > select * from graph g > union distinct > select g.* > from graph g, search_graph sg > where g.f = sg.t > ) cycle f, t set is_cycle to 'Y' default 'N' using path > select * from search_graph; > > 2024-09-15 14:58:09.853