vacuum freeze - possible improvements

2021-04-12 Thread Virender Singla
Hi Postgres Community,

Regarding anti wraparound vacuums (to freeze tuples), I see it has to scan
all the pages which are not frozen-all (looking at visibility map). That
means even if we want to freeze less transactions only (For ex - by
increasing parameter vacuum_freeze_min_age to 1B), still it will scan all
the pages in the visibility map and a time taking process.

Can there be any improvement on this process so VACUUM knows the
tuple/pages of those transactions which need to freeze up.

Benefit of such an improvement is that if we are reaching transaction id
close to 2B (and downtime), that time we can quickly recover the database
with vacuuming freeze only a few millions rows with quick lookup rather
than going all the pages from visibility map.

For Ex - A Binary Tree structure where it gets all the rows corresponding
to a table including transaction ids. So whenever we say free all tuples
having transaction id greater than x and less than y. Yes that makes extra
overhead on data load and lots of other things to consider.


Thanks,
Virender


Re: vacuum freeze - possible improvements

2021-04-13 Thread Virender Singla
Thanks Masahiko for the response.

"What is
the use case where users want to freeze fewer transactions, meaning
invoking anti-wraparound frequently?"

My overall focus here is anti wraparound vacuum on huge tables in emergency
situations (where we reached very close to  2B transactions or already in
outage window). In this situation we want to recover ASAP instead of having
many hours of outage.The Purpose of increasing "vacuum_freeze_min_age" to
high value is that anti wraparound vacuum will have to do less work because
we are asking less transactions/tuples to freeze (Of Course subsequent
vacuum has to do the remaining work).

"So the vacuum freeze will still have to
process tuples that are inserted/modified during consuming 1 billion
transactions. It seems to me that it’s not fewer transactions."

Yes another thing here is anti wraparound vacuum also cleans dead tuples
but i am not sure what we can do to avoid that.
There can be vacuum to only freeze the tulpes?

Thanks for sharing PG14 improvements, those are nice to have. But still the
anti wraparound vacuum will have to scan all the pages (from visibility
map) even if we are freezing fewer transactions because currently there is
no way to know what block/tuple contains which transaction id. If there is
a way then it would be easier to directly freeze those tuples quickly and
advance the relfrozenxid for the table.


On Tue, Apr 13, 2021 at 7:52 AM Masahiko Sawada 
wrote:

> On Mon, Apr 12, 2021 at 5:38 PM Virender Singla 
> wrote:
> >
> > Hi Postgres Community,
> >
> > Regarding anti wraparound vacuums (to freeze tuples), I see it has to
> scan all the pages which are not frozen-all (looking at visibility map).
> That means even if we want to freeze less transactions only (For ex - by
> increasing parameter vacuum_freeze_min_age to 1B), still it will scan all
> the pages in the visibility map and a time taking process.
>
>  If vacuum_freeze_min_age is 1 billion, autovacuum_freeze_max_age is 2
> billion (vacuum_freeze_min_age is limited to the half of
> autovacuum_freeze_max_age). So vacuum freeze will still have to
> process tuples that are inserted/modified during consuming 1 billion
> transactions. It seems to me that it’s not fewer transactions. What is
> the use case where users want to freeze fewer transactions, meaning
> invoking anti-wraparound frequently?
>
> >
> > Can there be any improvement on this process so VACUUM knows the
> tuple/pages of those transactions which need to freeze up.
> >
> > Benefit of such an improvement is that if we are reaching transaction id
> close to 2B (and downtime), that time we can quickly recover the database
> with vacuuming freeze only a few millions rows with quick lookup rather
> than going all the pages from visibility map.
>
> Apart from this idea, in terms of speeding up vacuum,
> vacuum_failsafe_age parameter, introduced to PG14[1], would also be
> helpful. When the failsafe is triggered, cost-based delay is no longer
> be applied, and index vacuuming is bypassed in order to finish vacuum
> work and advance relfrozenxid as quickly as possible.
>
> Regards
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e55e7d1755cefbb44982fbacc7da461fa8684e6
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>


Re: vacuum freeze - possible improvements

2021-04-13 Thread Virender Singla
exactly my point, want to scan only 500GB data instead of 1TB.  That can be
handy  for vacuum freeze at a dangerous stage (reaching towards 2B).

"Maybe we can track the oldest xid per page in a map like visiblity map
or integrate it with visibility map. We need to freeze only pages that
are all-visible and whose oldest xid is older than the cut-off xid. I
think we need to track both xid and multi xid."

Yes I thought of that (keep track of olderst xid per page instead of per
tuple), only thing here is every time there is some modification on the
page, that oldest xid needs to be recalculated for respective page. Still
that makes sense with kind of BRIN type structure to keep the xid per page.
With Binary Tree Index structure, new transaction/tuple will fit right
side  (as that would be news transaction until 2B) and then other side leaf
blocks can be removed with every vacuum freeze.




On Tue, Apr 13, 2021 at 6:02 PM Masahiko Sawada 
wrote:

> On Tue, Apr 13, 2021 at 1:51 PM Virender Singla 
> wrote:
> >
> > Thanks Masahiko for the response.
> >
> > "What is
> > the use case where users want to freeze fewer transactions, meaning
> > invoking anti-wraparound frequently?"
> >
> > My overall focus here is anti wraparound vacuum on huge tables in
> emergency situations (where we reached very close to  2B transactions or
> already in outage window). In this situation we want to recover ASAP
> instead of having many hours of outage.The Purpose of increasing
> "vacuum_freeze_min_age" to high value is that anti wraparound vacuum will
> have to do less work because we are asking less transactions/tuples to
> freeze (Of Course subsequent vacuum has to do the remaining work).
>
> I think I understood your proposal. For example, if we insert 500GB
> tuples during the first 1 billion transactions and then insert more
> 500GB tuples into another 500GB blocks during the next 1 billion
> transactions, vacuum freeze scans 1TB whereas we scans only 500GB that
> are modified by the first insertions if we’re able to freeze directly
> tuples that are older than the cut-off. Is that right?
>
> >
> > "So the vacuum freeze will still have to
> > process tuples that are inserted/modified during consuming 1 billion
> > transactions. It seems to me that it’s not fewer transactions."
> >
> > Yes another thing here is anti wraparound vacuum also cleans dead tuples
> but i am not sure what we can do to avoid that.
> > There can be vacuum to only freeze the tulpes?
>
> I think it's a good idea to skip all work except for freezing tuples
> in emergency cases. Thanks to vacuum_failsafe_age we can avoid index
> vacuuming, index cleanup, and heap vacuuming.
>
> >
> > Thanks for sharing PG14 improvements, those are nice to have. But still
> the anti wraparound vacuum will have to scan all the pages (from visibility
> map) even if we are freezing fewer transactions because currently there is
> no way to know what block/tuple contains which transaction id.
>
> Yes, that feature is to speed up vacuum by dynamically disabling both
> cost-based delay and some cleanup work whereas your idea is to do that
> by speeding up heap scan.
>
> > If there is a way then it would be easier to directly freeze those
> tuples quickly and advance the relfrozenxid for the table.
>
> Maybe we can track the oldest xid per page in a map like visiblity map
> or integrate it with visibility map. We need to freeze only pages that
> are all-visible and whose oldest xid is older than the cut-off xid. I
> think we need to track both xid and multi xid.
>
> Regards,
>
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>


TOAST - why separate visibility map

2021-11-19 Thread Virender Singla
Why do Toast tables have it's own visibility map and xmin, xmax columns etc?
Isn't it increasing row size in a toast table and adding more complexity?

Ideally all the vacuum cleanup on a TOAST can be done based on Primary
table xmin,xmax and VM info. Yes, that makes any cleanup on TOAST to be
glued up with the Primary table.


Re: TOAST - why separate visibility map

2021-11-19 Thread Virender Singla
Another point that currently manual VACUUM job does cleanup/freeze on
primary table first and then toast table. It looks easy pick to possibly
have a configurable parameter to run it on both the tables in parallel.

On Sat, Nov 20, 2021 at 12:15 AM Virender Singla 
wrote:

> Why do Toast tables have it's own visibility map and xmin, xmax columns
> etc?
> Isn't it increasing row size in a toast table and adding more complexity?
>
> Ideally all the vacuum cleanup on a TOAST can be done based on Primary
> table xmin,xmax and VM info. Yes, that makes any cleanup on TOAST to be
> glued up with the Primary table.
>
>
>


Re: TOAST - why separate visibility map

2021-11-25 Thread Virender Singla
"Given the size of toasted data, the overhead is unlikely to be a
significant overhead. It's much more an issue for the main table, where
narrow rows are common."

Completely agree, row size should not be a big concern for toast tables.

However write amplification will happen with vacuum freeze where
transactions id need to freeze in wider toast table tuples as well. I have
not explored if TOAST has separate hint bits info as well. In that case it
means normal vacuum (or SELECT after WRITE) has to completely rewrite the
big toast table tuples along with the small main table to set the hint bits
(commit/rollback).

I believe B tree Index does not contain any seperate visibility info so
that means the only work VACUUM does on Indexes is cleaning up dead tuples.

With maintaining one visibility info, above operations could be way faster.
However now the main table and TOAST vacuuming process will be glued
together where optimization can be thought about like two synchronized
threads working together for main and TOAST table to do the cleanup job.
Agree that hot updates are gone  in TOAST if there is a common VM.

Overall this looks complex.

On Sat, Nov 20, 2021 at 9:46 PM Tom Lane  wrote:

> Andres Freund  writes:
> > On November 19, 2021 12:31:00 PM PST, Tom Lane 
> wrote:
> >> It might be feasible to drop the visibility map for toast tables,
> though.
>
> > I think it be a bad idea - the VM is used by vacuum to avoid rereading
> already vacuumed ranges. Loosing that for large toast tables would be bad.
>
> Ah, right.  I was thinking vacuuming depended on the other map fork,
> but of course it needs this one.
>
> In short, there are indeed good reasons why it works like this.
>
> regards, tom lane
>


vacuum on table1 skips rows because of a query on table2

2019-10-25 Thread Virender Singla
Currently I see the vacuum behavior for a table is that, even if a long
running query on a different table is executing in another read committed
transaction.
That vacuum in the 1st transaction skips the dead rows until the long
running query finishes.
Why that is the case, On same table long running query blocking vacuum we
can understand but why query on a different table block it.


Re: vacuum on table1 skips rows because of a query on table2

2019-10-26 Thread Virender Singla
If long-running transaction is "read committed", then we are sure that any
new query coming
(even on same  table1 as vacuum table)  will need snapshot on point of time
query start and not the time transaction
starts (but still why read committed transaction on table2 cause vacuum on
table1 to skip rows).
Hence if a vacuum on table1 sees that all the transactions in the database
are "read committed" and no one
accessing table1, vacuum should be able to clear dead rows.
For read committed transactions, different table should not interfere with
each other.

On Fri, Oct 25, 2019 at 10:16 PM Tom Lane  wrote:

> Virender Singla  writes:
> > Currently I see the vacuum behavior for a table is that, even if a long
> > running query on a different table is executing in another read committed
> > transaction.
> > That vacuum in the 1st transaction skips the dead rows until the long
> > running query finishes.
> > Why that is the case, On same table long running query blocking vacuum we
> > can understand but why query on a different table block it.
>
> Probably because vacuum's is-this-row-dead-to-everyone tests are based
> on the global xmin minimum.  This must be so, because even if the
> long-running transaction hasn't touched the table being vacuumed,
> we don't know that it won't do so in future.  So we can't remove
> rows that it should be able to see if it were to look.
>
> regards, tom lane
>


invoker function security issues

2022-06-08 Thread Virender Singla
I believe functions in Postgres follow a late binding approach and hence
nested function dependencies are resolved using search_path at run time.
This way a user can override nested functions in its schema and change the
behaviour of wrapper functions. However, a more serious issue is when
functional Indexes (with nested function calls) are created on a table and
then the data inserted in Indexes could be entirely dependent on which user
is inserting the data (by overriding nested function).

I performed a couple of test cases where data inserted is dependent on the
user overriding nested functions. I understand this is not the best
practice to scatter functions/indexes/tables in different different schemas
and use such kind schema setup but I still expect Postgres to save us from
such data inconsistencies issues by using early binding for functional
Indexes. In fact Postgres does that linking for a single function Index
(where no nested function are there) and qualifies the function used in the
Index with its schema name and also it works in cases where all
functions, table, Indexes are present in the same schema.

However still there are cases where functional Indexes are created on
extension functions (For Ex - cube extension) which are present in
different schemas and then those cube functions are defined as invoker
security type with nested functions calls without any schema qualification.

Issue that would arise with late binding for functional Indexes is that
when we are migrating such tables/indexes/data from one database to another
(using pg_dump/pg_restore or any other method) data can be changed
depending on which user we are using for import.
(These tests i performed using invoker functions, i think definer functions
produce correct behavior). One way would be to define search_path for such
nested functions.


   1. =Case1==
   2.
   3. ##Table and functions are in different schemas.
   4.
   5. Session1::
   6. User:Postgres
   7.
   8. create user idxusr1 with password '*';
   9. grant idxusr1 to postgres;
   10. create schema idxusr1 AUTHORIZATION idxusr1;
   11.
   12. create user idxusr2 with password '*';
   13. grant idxusr2 to postgres;
   14. create schema idxusr2 AUTHORIZATION idxusr2;
   15.
   16. Session2::
   17. User:idxusr1
   18.
   19. set search_path to idxusr1,public;
   20.
   21. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)';
   22.
   23. CREATE FUNCTION wrapsum(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)';
   24.
   25. ##create table in another schema
   26.
   27. create table public.test(n1 int);
   28. create unique index idxtst on public.test(idxusr1.wrapsum(n1,1));
   29.
   30. grant insert on table public.test to idxusr2;
   31.
   32. postgres=> insert into test values(1);
   33. INSERT 0 1
   34. postgres=> insert into test values(1);
   35. ERROR:  duplicate key value violates unique constraint "idxtst"
   36. DETAIL:  Key (wrapsum(n1, 1))=(2) already exists.
   37.
   38. Session3::
   39. User:idxusr2
   40.
   41. set search_path to idxusr2,public;
   42.
   43. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1 - $2)';
   44.
   45. postgres=> insert into test values(1);
   46. INSERT 0 1
   47. postgres=> insert into test values(1);
   48. ERROR:  duplicate key value violates unique constraint "idxtst"
   49. DETAIL:  Key (idxusr1.wrapsum(n1, 1))=(0) already exists.
   50.
   51. ==Case2==
   52.
   53.  ##Functions are in different schemas.
   54.
   55. Session1::
   56. User:Postgres
   57.
   58. create user idxusr1 with password '*';
   59. grant idxusr1 to postgres;
   60. create schema idxusr1 AUTHORIZATION idxusr1;
   61.
   62. create user idxusr2 with password '*';
   63. grant idxusr2 to postgres;
   64. create schema idxusr2 AUTHORIZATION idxusr2;
   65.
   66. Session2::
   67. User:idxusr1
   68.
   69. set search_path to idxusr1,public;
   70.
   71. ##create internal function in own schema and wrapper function
in another schema.
   72.
   73. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)';
   74.
   75. CREATE FUNCTION public.wrapsum(int, int) RETURNS int LANGUAGE
SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)';
   76.
   77. create table test(n1 int);
   78. create unique index idxtst on test(public.wrapsum(n1,1));
   79.
   80. grant usage on schema idxusr1 to idxusr2;
   81. grant insert on table test to idxusr2;
   82. postgres=> insert into test values(1);
   83. INSERT 0 1
   84. postgres=> insert into test values(1);
   85. ERROR:  duplicate key value violates unique constraint "idxtst"
   86. DETAIL:  Key (wrapsum(n1, 1))=(2) already exists.
   87.
   88. Session3::
   89. User:idxusr2
   90.
   91. set search_path to idxusr2,public;
   92.
   93. CREATE

Postgres default FILLFACTOR value

2020-05-08 Thread Virender Singla
Why Postgres default FILLFACTOR for table is 100 and for Index is 90.

Although Oracle is having completely different MVCC architecture, it uses
default 90 for table and 100 for Index (exact reverse of Postgres)

Postgres blocks needed more spaces for row update compares to Oracle
(because Oracle keeps buffer space only for row expansion, whereas Postgres
need to create new versioned row). As I see Postgres is more suitable for
OLTP workload, keeping TABLE FILLFACTOR value to 90 is more suitable rather
than stressing to save storage space. Less FILLFACTOR value will be useful
to make UPDATEs as HOT applicable as well and that is going to benefit new
Postgres adopting users who are initially not aware of such setting and
only realize this later when VACUUM are really running long and Indexes
gets bloated. .

Other side Index FILLFACTOR makes sense only for existing populated tables
and for any row (new INSERTs or INSERT coming through UPDATEs), it can fill
the block above FILLFACTOR value. I think 100 default make more sense here.


Re: Postgres default FILLFACTOR value

2020-05-17 Thread Virender Singla
In Postgres, Index FILLFACTOR only works for monotonically increasing
column values and for random values it will do 50:50 block split. However
it's really less likely that monotonically increasing columns gets updated
then why we need to waste that 10% space and also making Index range scan
on such tables less performant.

postgres=> create table test(id bigint);
CREATE TABLE
postgres=> CREATE INDEX idx1_test ON test (id)  with (fillfactor = 100);
CREATE INDEX
postgres=> CREATE INDEX idx2_test ON test (id); --default to 90.
CREATE INDEX

postgres=> insert into test SELECT ceil(random() * 1000) from
generate_series(1, 1000) AS temp (id) ;
INSERT 0 1000

postgres=> \di+ idx1_test
  List of relations
 Schema |   Name| Type  |  Owner   | Table |  Size  | Description
+---+---+--+---++-
 public | idx1_test | index | postgres | test  | 278 MB |

postgres=> \di+ idx2_test
  List of relations
 Schema |   Name| Type  |  Owner   | Table |  Size  | Description
+---+---+--+---++-
 public | idx2_test | index | postgres | test  | 280 MB |

postgres=> update test set id = id+1 where id%100=0;
UPDATE 99671
postgres=> \di+ idx1_test
  List of relations
 Schema |   Name| Type  |  Owner   | Table |  Size  | Description
+---+---+--+---++-
 public | idx1_test | index | postgres | test  | 281 MB |

postgres=> \di+ idx2_test
  List of relations
 Schema |   Name| Type  |  Owner   | Table |  Size  |
+---+---+--+---++---
 public | idx2_test | index | postgres | test  | 282 MB |


On Fri, May 8, 2020 at 1:50 PM Virender Singla 
wrote:

> Why Postgres default FILLFACTOR for table is 100 and for Index is 90.
>
> Although Oracle is having completely different MVCC architecture, it uses
> default 90 for table and 100 for Index (exact reverse of Postgres)
>
> Postgres blocks needed more spaces for row update compares to Oracle
> (because Oracle keeps buffer space only for row expansion, whereas Postgres
> need to create new versioned row). As I see Postgres is more suitable for
> OLTP workload, keeping TABLE FILLFACTOR value to 90 is more suitable rather
> than stressing to save storage space. Less FILLFACTOR value will be useful
> to make UPDATEs as HOT applicable as well and that is going to benefit new
> Postgres adopting users who are initially not aware of such setting and
> only realize this later when VACUUM are really running long and Indexes
> gets bloated. .
>
> Other side Index FILLFACTOR makes sense only for existing populated tables
> and for any row (new INSERTs or INSERT coming through UPDATEs), it can fill
> the block above FILLFACTOR value. I think 100 default make more sense here.
>
>
>
>