vacuum freeze - possible improvements
Hi Postgres Community, Regarding anti wraparound vacuums (to freeze tuples), I see it has to scan all the pages which are not frozen-all (looking at visibility map). That means even if we want to freeze less transactions only (For ex - by increasing parameter vacuum_freeze_min_age to 1B), still it will scan all the pages in the visibility map and a time taking process. Can there be any improvement on this process so VACUUM knows the tuple/pages of those transactions which need to freeze up. Benefit of such an improvement is that if we are reaching transaction id close to 2B (and downtime), that time we can quickly recover the database with vacuuming freeze only a few millions rows with quick lookup rather than going all the pages from visibility map. For Ex - A Binary Tree structure where it gets all the rows corresponding to a table including transaction ids. So whenever we say free all tuples having transaction id greater than x and less than y. Yes that makes extra overhead on data load and lots of other things to consider. Thanks, Virender
Re: vacuum freeze - possible improvements
Thanks Masahiko for the response. "What is the use case where users want to freeze fewer transactions, meaning invoking anti-wraparound frequently?" My overall focus here is anti wraparound vacuum on huge tables in emergency situations (where we reached very close to 2B transactions or already in outage window). In this situation we want to recover ASAP instead of having many hours of outage.The Purpose of increasing "vacuum_freeze_min_age" to high value is that anti wraparound vacuum will have to do less work because we are asking less transactions/tuples to freeze (Of Course subsequent vacuum has to do the remaining work). "So the vacuum freeze will still have to process tuples that are inserted/modified during consuming 1 billion transactions. It seems to me that it’s not fewer transactions." Yes another thing here is anti wraparound vacuum also cleans dead tuples but i am not sure what we can do to avoid that. There can be vacuum to only freeze the tulpes? Thanks for sharing PG14 improvements, those are nice to have. But still the anti wraparound vacuum will have to scan all the pages (from visibility map) even if we are freezing fewer transactions because currently there is no way to know what block/tuple contains which transaction id. If there is a way then it would be easier to directly freeze those tuples quickly and advance the relfrozenxid for the table. On Tue, Apr 13, 2021 at 7:52 AM Masahiko Sawada wrote: > On Mon, Apr 12, 2021 at 5:38 PM Virender Singla > wrote: > > > > Hi Postgres Community, > > > > Regarding anti wraparound vacuums (to freeze tuples), I see it has to > scan all the pages which are not frozen-all (looking at visibility map). > That means even if we want to freeze less transactions only (For ex - by > increasing parameter vacuum_freeze_min_age to 1B), still it will scan all > the pages in the visibility map and a time taking process. > > If vacuum_freeze_min_age is 1 billion, autovacuum_freeze_max_age is 2 > billion (vacuum_freeze_min_age is limited to the half of > autovacuum_freeze_max_age). So vacuum freeze will still have to > process tuples that are inserted/modified during consuming 1 billion > transactions. It seems to me that it’s not fewer transactions. What is > the use case where users want to freeze fewer transactions, meaning > invoking anti-wraparound frequently? > > > > > Can there be any improvement on this process so VACUUM knows the > tuple/pages of those transactions which need to freeze up. > > > > Benefit of such an improvement is that if we are reaching transaction id > close to 2B (and downtime), that time we can quickly recover the database > with vacuuming freeze only a few millions rows with quick lookup rather > than going all the pages from visibility map. > > Apart from this idea, in terms of speeding up vacuum, > vacuum_failsafe_age parameter, introduced to PG14[1], would also be > helpful. When the failsafe is triggered, cost-based delay is no longer > be applied, and index vacuuming is bypassed in order to finish vacuum > work and advance relfrozenxid as quickly as possible. > > Regards > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e55e7d1755cefbb44982fbacc7da461fa8684e6 > > -- > Masahiko Sawada > EDB: https://www.enterprisedb.com/ >
Re: vacuum freeze - possible improvements
exactly my point, want to scan only 500GB data instead of 1TB. That can be handy for vacuum freeze at a dangerous stage (reaching towards 2B). "Maybe we can track the oldest xid per page in a map like visiblity map or integrate it with visibility map. We need to freeze only pages that are all-visible and whose oldest xid is older than the cut-off xid. I think we need to track both xid and multi xid." Yes I thought of that (keep track of olderst xid per page instead of per tuple), only thing here is every time there is some modification on the page, that oldest xid needs to be recalculated for respective page. Still that makes sense with kind of BRIN type structure to keep the xid per page. With Binary Tree Index structure, new transaction/tuple will fit right side (as that would be news transaction until 2B) and then other side leaf blocks can be removed with every vacuum freeze. On Tue, Apr 13, 2021 at 6:02 PM Masahiko Sawada wrote: > On Tue, Apr 13, 2021 at 1:51 PM Virender Singla > wrote: > > > > Thanks Masahiko for the response. > > > > "What is > > the use case where users want to freeze fewer transactions, meaning > > invoking anti-wraparound frequently?" > > > > My overall focus here is anti wraparound vacuum on huge tables in > emergency situations (where we reached very close to 2B transactions or > already in outage window). In this situation we want to recover ASAP > instead of having many hours of outage.The Purpose of increasing > "vacuum_freeze_min_age" to high value is that anti wraparound vacuum will > have to do less work because we are asking less transactions/tuples to > freeze (Of Course subsequent vacuum has to do the remaining work). > > I think I understood your proposal. For example, if we insert 500GB > tuples during the first 1 billion transactions and then insert more > 500GB tuples into another 500GB blocks during the next 1 billion > transactions, vacuum freeze scans 1TB whereas we scans only 500GB that > are modified by the first insertions if we’re able to freeze directly > tuples that are older than the cut-off. Is that right? > > > > > "So the vacuum freeze will still have to > > process tuples that are inserted/modified during consuming 1 billion > > transactions. It seems to me that it’s not fewer transactions." > > > > Yes another thing here is anti wraparound vacuum also cleans dead tuples > but i am not sure what we can do to avoid that. > > There can be vacuum to only freeze the tulpes? > > I think it's a good idea to skip all work except for freezing tuples > in emergency cases. Thanks to vacuum_failsafe_age we can avoid index > vacuuming, index cleanup, and heap vacuuming. > > > > > Thanks for sharing PG14 improvements, those are nice to have. But still > the anti wraparound vacuum will have to scan all the pages (from visibility > map) even if we are freezing fewer transactions because currently there is > no way to know what block/tuple contains which transaction id. > > Yes, that feature is to speed up vacuum by dynamically disabling both > cost-based delay and some cleanup work whereas your idea is to do that > by speeding up heap scan. > > > If there is a way then it would be easier to directly freeze those > tuples quickly and advance the relfrozenxid for the table. > > Maybe we can track the oldest xid per page in a map like visiblity map > or integrate it with visibility map. We need to freeze only pages that > are all-visible and whose oldest xid is older than the cut-off xid. I > think we need to track both xid and multi xid. > > Regards, > > > -- > Masahiko Sawada > EDB: https://www.enterprisedb.com/ >
TOAST - why separate visibility map
Why do Toast tables have it's own visibility map and xmin, xmax columns etc? Isn't it increasing row size in a toast table and adding more complexity? Ideally all the vacuum cleanup on a TOAST can be done based on Primary table xmin,xmax and VM info. Yes, that makes any cleanup on TOAST to be glued up with the Primary table.
Re: TOAST - why separate visibility map
Another point that currently manual VACUUM job does cleanup/freeze on primary table first and then toast table. It looks easy pick to possibly have a configurable parameter to run it on both the tables in parallel. On Sat, Nov 20, 2021 at 12:15 AM Virender Singla wrote: > Why do Toast tables have it's own visibility map and xmin, xmax columns > etc? > Isn't it increasing row size in a toast table and adding more complexity? > > Ideally all the vacuum cleanup on a TOAST can be done based on Primary > table xmin,xmax and VM info. Yes, that makes any cleanup on TOAST to be > glued up with the Primary table. > > >
Re: TOAST - why separate visibility map
"Given the size of toasted data, the overhead is unlikely to be a significant overhead. It's much more an issue for the main table, where narrow rows are common." Completely agree, row size should not be a big concern for toast tables. However write amplification will happen with vacuum freeze where transactions id need to freeze in wider toast table tuples as well. I have not explored if TOAST has separate hint bits info as well. In that case it means normal vacuum (or SELECT after WRITE) has to completely rewrite the big toast table tuples along with the small main table to set the hint bits (commit/rollback). I believe B tree Index does not contain any seperate visibility info so that means the only work VACUUM does on Indexes is cleaning up dead tuples. With maintaining one visibility info, above operations could be way faster. However now the main table and TOAST vacuuming process will be glued together where optimization can be thought about like two synchronized threads working together for main and TOAST table to do the cleanup job. Agree that hot updates are gone in TOAST if there is a common VM. Overall this looks complex. On Sat, Nov 20, 2021 at 9:46 PM Tom Lane wrote: > Andres Freund writes: > > On November 19, 2021 12:31:00 PM PST, Tom Lane > wrote: > >> It might be feasible to drop the visibility map for toast tables, > though. > > > I think it be a bad idea - the VM is used by vacuum to avoid rereading > already vacuumed ranges. Loosing that for large toast tables would be bad. > > Ah, right. I was thinking vacuuming depended on the other map fork, > but of course it needs this one. > > In short, there are indeed good reasons why it works like this. > > regards, tom lane >
vacuum on table1 skips rows because of a query on table2
Currently I see the vacuum behavior for a table is that, even if a long running query on a different table is executing in another read committed transaction. That vacuum in the 1st transaction skips the dead rows until the long running query finishes. Why that is the case, On same table long running query blocking vacuum we can understand but why query on a different table block it.
Re: vacuum on table1 skips rows because of a query on table2
If long-running transaction is "read committed", then we are sure that any new query coming (even on same table1 as vacuum table) will need snapshot on point of time query start and not the time transaction starts (but still why read committed transaction on table2 cause vacuum on table1 to skip rows). Hence if a vacuum on table1 sees that all the transactions in the database are "read committed" and no one accessing table1, vacuum should be able to clear dead rows. For read committed transactions, different table should not interfere with each other. On Fri, Oct 25, 2019 at 10:16 PM Tom Lane wrote: > Virender Singla writes: > > Currently I see the vacuum behavior for a table is that, even if a long > > running query on a different table is executing in another read committed > > transaction. > > That vacuum in the 1st transaction skips the dead rows until the long > > running query finishes. > > Why that is the case, On same table long running query blocking vacuum we > > can understand but why query on a different table block it. > > Probably because vacuum's is-this-row-dead-to-everyone tests are based > on the global xmin minimum. This must be so, because even if the > long-running transaction hasn't touched the table being vacuumed, > we don't know that it won't do so in future. So we can't remove > rows that it should be able to see if it were to look. > > regards, tom lane >
invoker function security issues
I believe functions in Postgres follow a late binding approach and hence nested function dependencies are resolved using search_path at run time. This way a user can override nested functions in its schema and change the behaviour of wrapper functions. However, a more serious issue is when functional Indexes (with nested function calls) are created on a table and then the data inserted in Indexes could be entirely dependent on which user is inserting the data (by overriding nested function). I performed a couple of test cases where data inserted is dependent on the user overriding nested functions. I understand this is not the best practice to scatter functions/indexes/tables in different different schemas and use such kind schema setup but I still expect Postgres to save us from such data inconsistencies issues by using early binding for functional Indexes. In fact Postgres does that linking for a single function Index (where no nested function are there) and qualifies the function used in the Index with its schema name and also it works in cases where all functions, table, Indexes are present in the same schema. However still there are cases where functional Indexes are created on extension functions (For Ex - cube extension) which are present in different schemas and then those cube functions are defined as invoker security type with nested functions calls without any schema qualification. Issue that would arise with late binding for functional Indexes is that when we are migrating such tables/indexes/data from one database to another (using pg_dump/pg_restore or any other method) data can be changed depending on which user we are using for import. (These tests i performed using invoker functions, i think definer functions produce correct behavior). One way would be to define search_path for such nested functions. 1. =Case1== 2. 3. ##Table and functions are in different schemas. 4. 5. Session1:: 6. User:Postgres 7. 8. create user idxusr1 with password '*'; 9. grant idxusr1 to postgres; 10. create schema idxusr1 AUTHORIZATION idxusr1; 11. 12. create user idxusr2 with password '*'; 13. grant idxusr2 to postgres; 14. create schema idxusr2 AUTHORIZATION idxusr2; 15. 16. Session2:: 17. User:idxusr1 18. 19. set search_path to idxusr1,public; 20. 21. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)'; 22. 23. CREATE FUNCTION wrapsum(int, int) RETURNS int LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)'; 24. 25. ##create table in another schema 26. 27. create table public.test(n1 int); 28. create unique index idxtst on public.test(idxusr1.wrapsum(n1,1)); 29. 30. grant insert on table public.test to idxusr2; 31. 32. postgres=> insert into test values(1); 33. INSERT 0 1 34. postgres=> insert into test values(1); 35. ERROR: duplicate key value violates unique constraint "idxtst" 36. DETAIL: Key (wrapsum(n1, 1))=(2) already exists. 37. 38. Session3:: 39. User:idxusr2 40. 41. set search_path to idxusr2,public; 42. 43. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1 - $2)'; 44. 45. postgres=> insert into test values(1); 46. INSERT 0 1 47. postgres=> insert into test values(1); 48. ERROR: duplicate key value violates unique constraint "idxtst" 49. DETAIL: Key (idxusr1.wrapsum(n1, 1))=(0) already exists. 50. 51. ==Case2== 52. 53. ##Functions are in different schemas. 54. 55. Session1:: 56. User:Postgres 57. 58. create user idxusr1 with password '*'; 59. grant idxusr1 to postgres; 60. create schema idxusr1 AUTHORIZATION idxusr1; 61. 62. create user idxusr2 with password '*'; 63. grant idxusr2 to postgres; 64. create schema idxusr2 AUTHORIZATION idxusr2; 65. 66. Session2:: 67. User:idxusr1 68. 69. set search_path to idxusr1,public; 70. 71. ##create internal function in own schema and wrapper function in another schema. 72. 73. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)'; 74. 75. CREATE FUNCTION public.wrapsum(int, int) RETURNS int LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)'; 76. 77. create table test(n1 int); 78. create unique index idxtst on test(public.wrapsum(n1,1)); 79. 80. grant usage on schema idxusr1 to idxusr2; 81. grant insert on table test to idxusr2; 82. postgres=> insert into test values(1); 83. INSERT 0 1 84. postgres=> insert into test values(1); 85. ERROR: duplicate key value violates unique constraint "idxtst" 86. DETAIL: Key (wrapsum(n1, 1))=(2) already exists. 87. 88. Session3:: 89. User:idxusr2 90. 91. set search_path to idxusr2,public; 92. 93. CREATE
Postgres default FILLFACTOR value
Why Postgres default FILLFACTOR for table is 100 and for Index is 90. Although Oracle is having completely different MVCC architecture, it uses default 90 for table and 100 for Index (exact reverse of Postgres) Postgres blocks needed more spaces for row update compares to Oracle (because Oracle keeps buffer space only for row expansion, whereas Postgres need to create new versioned row). As I see Postgres is more suitable for OLTP workload, keeping TABLE FILLFACTOR value to 90 is more suitable rather than stressing to save storage space. Less FILLFACTOR value will be useful to make UPDATEs as HOT applicable as well and that is going to benefit new Postgres adopting users who are initially not aware of such setting and only realize this later when VACUUM are really running long and Indexes gets bloated. . Other side Index FILLFACTOR makes sense only for existing populated tables and for any row (new INSERTs or INSERT coming through UPDATEs), it can fill the block above FILLFACTOR value. I think 100 default make more sense here.
Re: Postgres default FILLFACTOR value
In Postgres, Index FILLFACTOR only works for monotonically increasing column values and for random values it will do 50:50 block split. However it's really less likely that monotonically increasing columns gets updated then why we need to waste that 10% space and also making Index range scan on such tables less performant. postgres=> create table test(id bigint); CREATE TABLE postgres=> CREATE INDEX idx1_test ON test (id) with (fillfactor = 100); CREATE INDEX postgres=> CREATE INDEX idx2_test ON test (id); --default to 90. CREATE INDEX postgres=> insert into test SELECT ceil(random() * 1000) from generate_series(1, 1000) AS temp (id) ; INSERT 0 1000 postgres=> \di+ idx1_test List of relations Schema | Name| Type | Owner | Table | Size | Description +---+---+--+---++- public | idx1_test | index | postgres | test | 278 MB | postgres=> \di+ idx2_test List of relations Schema | Name| Type | Owner | Table | Size | Description +---+---+--+---++- public | idx2_test | index | postgres | test | 280 MB | postgres=> update test set id = id+1 where id%100=0; UPDATE 99671 postgres=> \di+ idx1_test List of relations Schema | Name| Type | Owner | Table | Size | Description +---+---+--+---++- public | idx1_test | index | postgres | test | 281 MB | postgres=> \di+ idx2_test List of relations Schema | Name| Type | Owner | Table | Size | +---+---+--+---++--- public | idx2_test | index | postgres | test | 282 MB | On Fri, May 8, 2020 at 1:50 PM Virender Singla wrote: > Why Postgres default FILLFACTOR for table is 100 and for Index is 90. > > Although Oracle is having completely different MVCC architecture, it uses > default 90 for table and 100 for Index (exact reverse of Postgres) > > Postgres blocks needed more spaces for row update compares to Oracle > (because Oracle keeps buffer space only for row expansion, whereas Postgres > need to create new versioned row). As I see Postgres is more suitable for > OLTP workload, keeping TABLE FILLFACTOR value to 90 is more suitable rather > than stressing to save storage space. Less FILLFACTOR value will be useful > to make UPDATEs as HOT applicable as well and that is going to benefit new > Postgres adopting users who are initially not aware of such setting and > only realize this later when VACUUM are really running long and Indexes > gets bloated. . > > Other side Index FILLFACTOR makes sense only for existing populated tables > and for any row (new INSERTs or INSERT coming through UPDATEs), it can fill > the block above FILLFACTOR value. I think 100 default make more sense here. > > > >