Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread 王旭
Hello!


My table is described as below:





CREATE TABLE IF NOT EXISTS mytable
(
uuid varchar(45) NOT NULL,
symbol_idsmallint NOT NULL,
...
...
PRIMARY KEY  (symbol_id,uuid)
) partition by hash(symbol_id)


create table mytable_0 partition of 0 FOR VALUES WITH (MODULUS 3, REMAINDER 0);
create table mytable_1 partition of 1 FOR VALUES WITH (MODULUS 3, REMAINDER 1);
create table mytable_2 partition of 2 FOR VALUES WITH (MODULUS 3, REMAINDER 2);






I understand that I can find the specific child table using explain.But is 
there any simpler way 

in which I can get the name of child table via a specific symbol_id value,so 
that I can use it to 

execute query on child table instead of on partition master table?



I tried something like this:  e.g., for symbol_id 6365, 

SELECT (hashint2(6365::SMALLINT)% 10)
But the result doesn't make any sense,cannot be used to determine which child 
table(mytable_0,mytable_1,mytable_2) my record is in.


The reason is that my frequent query doesn't need primary key(the query is on 
some other index),so query on a specific child table would 

give me much more improvement on performance(Query would execute without 
loading the primary key index at all).





Is there any simple function to do this job? Can someone help me on this?




Many Thanks,
James.

Re: How to run a task continuously in the background

2019-07-17 Thread Rob Sargent


> On Jul 17, 2019, at 1:26 AM, Dirk Mika  wrote:
> 
>  
> We used a trigger that called pg_notify 
> (https://www.postgresql.org/docs/9.5/sql-notify.html 
> ​) and then had another
> 
> process that LISTENed for notifications.
> 
>  
> 
> What kind of process is this? I'm assuming that this is an application 
> written in C.
> 
>  
> 
> The advantage of LISTEN / NOTIFY is only that the process which should 
> process data does not have to do polling, but is notified when there is 
> something to do.
> 
>  
> 
> Dirk
> 
Also, the NOTIFY wouldn’t significantly extend the lifetime or impact of the 
trigger.
>  
> 
> -- 
> Dirk Mika
> Software Developer
> 
> 
> 
> mika:timing GmbH
> Strundepark - Kürtener Str. 11b
> 51465 Bergisch Gladbach
> Germany
> 
> 
> fon +49 2202 2401-1197 
> dirk.m...@mikatiming.de
> www.mikatiming.de
> 
> 
> AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 
> Geschäftsführer: Harald Mika, Jörg Mika
> 
>  
> 


Re: How to run a task continuously in the background

2019-07-17 Thread Luca Ferrari
On Wed, Jul 17, 2019 at 9:38 AM Dirk Mika  wrote:
> That is basically still my main question. How do I start a background job 
> (e.g. a function) which waits by polling or LISTEN / NOTIFY for records in a 
> table to be processed.


You will have a trigger that, once new tuples are created (or older
update and so on) issues a NOTIFY.
Somewhere (within PostgreSQL or outside it) there will be a process
that issued a LISTEN and is locked until a notify comes in. Then it
does process whatever you need to do.
As an example your trigger function will be something like

and your listening process will be something like
.

This makes your processing fully asynchronous, and with some tune
allows you to decide the start/stop/resume policy as you need/wish.

Besides, it is quite hard for me to get to the point where you need to
check for new data every second, and therefore why you are not
satisfied with pg_cron or stuff like that.

Luca




Re: Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread Luca Ferrari
On Wed, Jul 17, 2019 at 9:19 AM 王旭  wrote:
> I tried something like this:  e.g., for symbol_id 6365,
> SELECT (hashint2(6365::SMALLINT)% 10)

shouldn't this be modulus 3 instead of 10?
The problem is that record 6365 is not where you expected to be?

As far as I know, there is no easy user-level way to get the route to
a table, but you can juggle with the expression that defined each
table and make a good guess.
However, your query should give a good idea:

# SELECT 'my_table_' || (hashint2(6365::smallint)% 3);
  ?column?

 my_table_2




Re: Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread James(王旭)
Thanks you Luca.


Yes my intention was " SELECT (hashint2(6365::SMALLINT)% 3) ", that's my 
mistake, thank you for pointing out.
 
Actually I just did a simple query in my single table: 


SELECT distinct(symbol_id) FROM _0 


and I got these results:


"symbol_id"
6521
1478
1964
5642
7470
1158
2429
9882
4542
5196
9178
8303
1091
9435
8133
1437
9072




From these results I can tell the route to a table is not even related with the 
mod function, right?
So It's hard for me to do any kind of guesses...
 
-- Original --
From:  "Luca Ferrari";
Date:  Wed, Jul 17, 2019 05:13 PM
To:  "王旭"; 
Cc:  "pgsql-general"; 
Subject:  Re: Issue related with patitioned table:How can I quickly determine 
which child table my record is in,given a specific primary key value?

 

On Wed, Jul 17, 2019 at 9:19 AM 王旭  wrote:
> I tried something like this:  e.g., for symbol_id 6365,
> SELECT (hashint2(6365::SMALLINT)% 10)

shouldn't this be modulus 3 instead of 10?
The problem is that record 6365 is not where you expected to be?

As far as I know, there is no easy user-level way to get the route to
a table, but you can juggle with the expression that defined each
table and make a good guess.
However, your query should give a good idea:

# SELECT 'my_table_' || (hashint2(6365::smallint)% 3);
  ?column?

 my_table_2

Re: Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread James(王旭)
Here's my PG version:


PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-36), 64-bit


Hope this helps. 
-- Original --
From:  "James(王旭)";
Date:  Wed, Jul 17, 2019 05:36 PM
To:  "Luca Ferrari"; 
Cc:  "pgsql-general"; 
Subject:  Re: Issue related with patitioned table:How can I quickly determine 
which child table my record is in,given a specific primary key value?

 

Thanks you Luca.


Yes my intention was " SELECT (hashint2(6365::SMALLINT)% 3) ", that's my 
mistake, thank you for pointing out.
 
Actually I just did a simple query in my single table: 


SELECT distinct(symbol_id) FROM _0 


and I got these results:


"symbol_id"
6521
1478
1964
5642
7470
1158
2429
9882
4542
5196
9178
8303
1091
9435
8133
1437
9072




From these results I can tell the route to a table is not even related with the 
mod function, right?
So It's hard for me to do any kind of guesses...
 
-- Original --
From:  "Luca Ferrari";
Date:  Wed, Jul 17, 2019 05:13 PM
To:  "王旭"; 
Cc:  "pgsql-general"; 
Subject:  Re: Issue related with patitioned table:How can I quickly determine 
which child table my record is in,given a specific primary key value?

 

On Wed, Jul 17, 2019 at 9:19 AM 王旭  wrote:
> I tried something like this:  e.g., for symbol_id 6365,
> SELECT (hashint2(6365::SMALLINT)% 10)

shouldn't this be modulus 3 instead of 10?
The problem is that record 6365 is not where you expected to be?

As far as I know, there is no easy user-level way to get the route to
a table, but you can juggle with the expression that defined each
table and make a good guess.
However, your query should give a good idea:

# SELECT 'my_table_' || (hashint2(6365::smallint)% 3);
  ?column?

 my_table_2

Re: Matview size - space increased on concurrently refresh

2019-07-17 Thread Nicola Contu
Il giorno dom 14 lug 2019 alle ore 22:23 Tom Lane  ha
scritto:

> [ please do not top-post in your replies, it makes the conversation hard
>   to follow ]
>
> Nicola Contu  writes:
> > Il dom 14 lug 2019, 21:34 Kaixi Luo  ha scritto:
> >> This is normal and something to be expected. When refreshing the
> >> materialized view, the new data is written to a disk and then the two
> >> tables are diffed. After the refresh finishes, your view size should go
> >> back to normal.
>
> > It does not. That's the issue.
> > It always increases of 120mb and it reached 12gb instead of just 180mb.
>
> A concurrent matview refresh will necessarily leave behind two copies
> of any rows it changes, just like any other row-update operation in
> Postgres.  Once there are no concurrent transactions that can "see"
> the old row copies, they should be reclaimable by vacuum.
>
> Since you're not seeing autovacuum reclaim the space automatically,
> I hypothesize that you've got autovacuum turned off or dialed down
> to unrealistically non-aggressive settings.  Or possibly you have
> old open transactions that are preventing reclaiming dead rows
> (because they can still possibly "see" those rows).  Either of those
> explanations should imply that you're getting similar bloat in every
> other table and matview, though.
>
> You might want to look into pg_stat_all_tables to see what it says
> about the last_autovacuum time etc. for that matview.  Another source
> of insight is to do a manual "vacuum verbose" on the matview and see
> what that says about removable and nonremovable rows.
>
> regards, tom lane
>

This matview has nothing strange and nothign custom.
We can replicate the matview that is not used by anyone.

cmdv3=# vacuum (full,analyze,verbose) public.matview_nm_connections;
INFO:  vacuuming "public.matview_nm_connections"
INFO:  "matview_nm_connections": found 0 removable, 295877 nonremovable row
versions in 33654 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.45 s, system: 0.43 s, elapsed: 1.24 s.
INFO:  analyzing "public.matview_nm_connections"
INFO:  "matview_nm_connections": scanned 16986 of 16986 pages, containing
295877 live rows and 0 dead rows; 3 rows in sample, 295877 estimated
total rows
VACUUM

This is an example of full and verbose vacuum. Everytime I refresh it I get
the size increased.

See stats from the pg_stat_all_tables :

cmdv3=# select * from pg_stat_all_tables where relname =
'matview_nm_connections';
   relid| schemaname |relname | seq_scan | seq_tup_read
| idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
 last_vacuum  |last_autovacuum|
last_analyze  |   last_autoanalyze
  | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
+++--+--+--+---+---+---+---+---+++-+---+---+---+-
--+--+--+---+---
 3466831733 | public | matview_nm_connections | 3725 |540992219
|33235 |255113 |  96874161 | 0 |  95692276 |
  0 | 295877 |  0 |   0 | 2019-07-12
11:58:39.198049+00 | 2019-07-16 11:07:02.765612+00 | 2019-07-17
10:28:08.819679+00 | 2019-07-16 11:03:32.4895
73+00 |5 |   29 |11 |17
(1 row)


Re: How to run a task continuously in the background

2019-07-17 Thread Weatherby,Gerard
We used a python process running continually on a linux client for the LISTEN 
piece.

Reading the documentation, it seems that a background worker ( 
https://www.postgresql.org/docs/11/bgworker.html) might be a solution to your 
requirements. I don’t have personal experience with them.
--
Gerard Weatherby | Application Architect
NMRbox | Department of Molecular Biology and Biophysics | UConn Health
263 Farmington Avenue, Farmington, CT 06030-6406
Phone: 860 679 8484
uchc.edu

On Jul 17, 2019, at 5:57 AM, Dirk Mika 
mailto:dirk.m...@mikatiming.de>> wrote:

You will have a trigger that, once new tuples are created (or older
update and so on) issues a NOTIFY.
Somewhere (within PostgreSQL or outside it) there will be a process
that issued a LISTEN and is locked until a notify comes in. Then it
does process whatever you need to do.
As an example your trigger function will be something like
>
and your listening process will be something like
>.

Thanks for the examples. I’ll look into them.

This makes your processing fully asynchronous, and with some tune
allows you to decide the start/stop/resume policy as you need/wish.

Besides, it is quite hard for me to get to the point where you need to
check for new data every second, and therefore why you are not
satisfied with pg_cron or stuff like that.

pg_cron doesn’t start the task instantly and queues subsequent runs, if the 
task is still running. I just need to start the task once and it should keep 
running until stopped / killed.

Maybe I'll have to rephrase it.
Suppose I have a procedure and want to start it without the client where I 
start the procedure waiting for it to finish. And I want the procedure to 
continue even if the client that started it quits.
And I want to be able to check if the procedure is still running.

Dirk

--
Dirk Mika
Software Developer





mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany



fon +49 2202 2401-1197
dirk.m...@mikatiming.de
www.mikatiming.de



AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika








Re: Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread Luca Ferrari
On Wed, Jul 17, 2019 at 11:41 AM James(王旭)  wrote:
> From these results I can tell the route to a table is not even related with 
> the mod function, right?
> So It's hard for me to do any kind of guesses...

Because it is the wrong function.
According to \d+ on a child table and partbounds.c the function called
is satisfied_hash_partition:

testdb=# select satisfies_hash_partition('153221'::oid, 3, 0, 6521);
 satisfies_hash_partition
--
 t
(1 row)

testdb=# select satisfies_hash_partition('153221'::oid, 3, 1, 6521);
 satisfies_hash_partition
--
 f
(1 row)

The first argument is the table id (partitioned one, the root), the
second is the reminder, third is the partition table, last is your
value.
Therefore I suspect you have to iterate on your partition numbers from
0 to x to see if a value fits in that partition, and then extract the
table name from that.

Hope its clear.

Luca




Re: Issue related with patitioned table:How can I quickly determine which child table my record is in,given a specific primary key value?

2019-07-17 Thread James(王旭)
Hi Luca,


Yes, that's the answer,It really works!
Thanks again Luca, you actually saved my day!


James.
-- Original --
From:  "Luca Ferrari";
Date:  Wed, Jul 17, 2019 06:49 PM
To:  "James(王旭)"; 
Cc:  "pgsql-general"; 
Subject:  Re: Issue related with patitioned table:How can I quickly determine 
which child table my record is in,given a specific primary key value?

 

On Wed, Jul 17, 2019 at 11:41 AM James(王旭)  wrote:
> From these results I can tell the route to a table is not even related with 
> the mod function, right?
> So It's hard for me to do any kind of guesses...

Because it is the wrong function.
According to \d+ on a child table and partbounds.c the function called
is satisfied_hash_partition:

testdb=# select satisfies_hash_partition('153221'::oid, 3, 0, 6521);
 satisfies_hash_partition
--
 t
(1 row)

testdb=# select satisfies_hash_partition('153221'::oid, 3, 1, 6521);
 satisfies_hash_partition
--
 f
(1 row)

The first argument is the table id (partitioned one, the root), the
second is the reminder, third is the partition table, last is your
value.
Therefore I suspect you have to iterate on your partition numbers from
0 to x to see if a value fits in that partition, and then extract the
table name from that.

Hope its clear.

Luca

Re: How to run a task continuously in the background

2019-07-17 Thread Rory Campbell-Lange
On 17/07/19, Luca Ferrari (fluca1...@gmail.com) wrote:
> On Wed, Jul 17, 2019 at 9:38 AM Dirk Mika  wrote:
> > That is basically still my main question. How do I start a
> > background job (e.g. a function) which waits by polling or LISTEN /
> > NOTIFY for records in a table to be processed.
> 
> You will have a trigger that, once new tuples are created (or older
> update and so on) issues a NOTIFY.
> Somewhere (within PostgreSQL or outside it) there will be a process
> that issued a LISTEN and is locked until a notify comes in. Then it
> does process whatever you need to do.

We make extensive use of postgresql 'contacting' an external process,
but our use case involves many databases in a cluster rather than many
schemas. Also we don't have to deal with cancelling the external
process. We chose this architecture to avoid many postgresql
connections for LISTEN/NOTIFY.

We use a pgmemcache interface trigger to update memcached with a
notification of the database holding items to be flushed. A python
looping process running under supervisord checks memcache for items to
be dealt with, and does so in a serial manner.

For the use case mentioned a per-schema process may be required or a
sub-process/thread created, which could check perhaps for memcache to
signal cancellation of processing for the schema. I guess one might then
have thread locking/cancellation issues to resolve.

Rory




Removing a key from jsonb is sloooow

2019-07-17 Thread Volkan Unsal
I'm trying to remove a key from a jsonb column in a table with 10K rows,
and the performance is abysmal. When the key is missing, it takes 5
minutes. When the key is present, it takes even longer.

Test with non-existent key:

>> update projects set misc = misc - 'foo';
Time: 324711.960 ms (05:24.712)

What can I do to improve this?


Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Adrian Klaver

On 7/17/19 7:30 AM, Volkan Unsal wrote:
I'm trying to remove a key from a jsonb column in a table with 10K rows, 
and the performance is abysmal. When the key is missing, it takes 5 
minutes. When the key is present, it takes even longer.


Test with non-existent key:

 >> update projects set misc = misc - 'foo';
Time: 324711.960 ms (05:24.712)

What can I do to improve this?


Provide some useful information:

1) Postgres version

2) Table schema

3) Explain analyze of query



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Tumasgiu Rossini
Hi,

Your current query actually process your 10K rows,
it is a bit of an overkill if only a few row contains the key you want to
delete.
Depending on how big your json data is, this could be problematic.

Have you considered adding a where clause to your query ?

Also, maybe you could create an index on your jsonb column
to improve the identification of rows which
contains the key you want to delete ?



Le mer. 17 juil. 2019 à 16:31, Volkan Unsal  a
écrit :

> I'm trying to remove a key from a jsonb column in a table with 10K rows,
> and the performance is abysmal. When the key is missing, it takes 5
> minutes. When the key is present, it takes even longer.
>
> Test with non-existent key:
>
> >> update projects set misc = misc - 'foo';
> Time: 324711.960 ms (05:24.712)
>
> What can I do to improve this?
>


Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Adrian Klaver

On 7/17/19 7:58 AM, Volkan Unsal wrote:

Please post to list also.
Ccing list

@Adrian

More information about my setup:

Postgres version:
PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit


Table schema:
CREATE TABLE public.projects (
    misc jsonb DEFAULT '{}'::jsonb NOT NULL
);

Explain analyze:
explain analyze update projects set misc = misc - 'foo';

Update on projects  (cost=0.00..4240.93 rows=10314 width=1149) (actual 
time=346318.291..346318.295 rows=0 loops=1)
   ->  Seq Scan on projects  (cost=0.00..4240.93 rows=10314 width=1149) 
(actual time=1.011..266.435 rows=10314 loops=1)

Planning time: 40.087 ms
Trigger trigger_populate_tsv_body_on_projects: time=341202.492 calls=10314
Execution time: 346320.260 ms

Time: 345969.035 ms (05:45.969)



On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 7/17/19 7:30 AM, Volkan Unsal wrote:
 > I'm trying to remove a key from a jsonb column in a table with
10K rows,
 > and the performance is abysmal. When the key is missing, it takes 5
 > minutes. When the key is present, it takes even longer.
 >
 > Test with non-existent key:
 >
 >  >> update projects set misc = misc - 'foo';
 > Time: 324711.960 ms (05:24.712)
 >
 > What can I do to improve this?

Provide some useful information:

1) Postgres version

2) Table schema

3) Explain analyze of query



-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
*Volkan Unsal*
/Product Engineer/
volkanunsal.com 



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Volkan Unsal
More information about my setup:

Postgres version:
PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

Table schema:
CREATE TABLE public.projects (
   misc jsonb DEFAULT '{}'::jsonb NOT NULL
);

Explain analyze:
explain analyze update projects set misc = misc - 'foo';

Update on projects  (cost=0.00..4240.93 rows=10314 width=1149) (actual
time=346318.291..346318.295 rows=0 loops=1)
  ->  Seq Scan on projects  (cost=0.00..4240.93 rows=10314 width=1149)
(actual time=1.011..266.435 rows=10314 loops=1)
Planning time: 40.087 ms

Trigger trigger_populate_tsv_body_on_projects: time=341202.492 calls=10314

Execution time: 346320.260 ms


Time: 345969.035 ms (05:45.969)


Figured out that it's due to the trigger. Thanks for your help, Adrian!


On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver 
wrote:

> On 7/17/19 7:30 AM, Volkan Unsal wrote:
> > I'm trying to remove a key from a jsonb column in a table with 10K rows,
> > and the performance is abysmal. When the key is missing, it takes 5
> > minutes. When the key is present, it takes even longer.
> >
> > Test with non-existent key:
> >
> >  >> update projects set misc = misc - 'foo';
> > Time: 324711.960 ms (05:24.712)
> >
> > What can I do to improve this?
>
> Provide some useful information:
>
> 1) Postgres version
>
> 2) Table schema
>
> 3) Explain analyze of query
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


-- 
*Volkan Unsal*
*Product Engineer*
volkanunsal.com


Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Adrian Klaver

On 7/17/19 7:59 AM, Volkan Unsal wrote:


Aha, it's due to the trigger, isn't it?


Yes.


On Wed, Jul 17, 2019 at 10:58 AM Volkan Unsal > wrote:


@Adrian

More information about my setup:

Postgres version:
PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

Table schema:
CREATE TABLE public.projects (
    misc jsonb DEFAULT '{}'::jsonb NOT NULL
);

Explain analyze:
explain analyze update projects set misc = misc - 'foo';

Update on projects  (cost=0.00..4240.93 rows=10314 width=1149)
(actual time=346318.291..346318.295 rows=0 loops=1)
   ->  Seq Scan on projects  (cost=0.00..4240.93 rows=10314
width=1149) (actual time=1.011..266.435 rows=10314 loops=1)
Planning time: 40.087 ms
Trigger trigger_populate_tsv_body_on_projects: time=341202.492
calls=10314
Execution time: 346320.260 ms

Time: 345969.035 ms (05:45.969)



On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver
mailto:adrian.kla...@aklaver.com>> wrote:

On 7/17/19 7:30 AM, Volkan Unsal wrote:
 > I'm trying to remove a key from a jsonb column in a table
with 10K rows,
 > and the performance is abysmal. When the key is missing, it
takes 5
 > minutes. When the key is present, it takes even longer.
 >
 > Test with non-existent key:
 >
 >  >> update projects set misc = misc - 'foo';
 > Time: 324711.960 ms (05:24.712)
 >
 > What can I do to improve this?

Provide some useful information:

1) Postgres version

2) Table schema

3) Explain analyze of query



-- 
Adrian Klaver

adrian.kla...@aklaver.com 



-- 
*Volkan Unsal*

/Product Engineer/
volkanunsal.com 



--
*Volkan Unsal*
/Product Engineer/
volkanunsal.com 



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Removing a key from jsonb is sloooow

2019-07-17 Thread Tom Lane
Volkan Unsal  writes:
> I'm trying to remove a key from a jsonb column in a table with 10K rows,
> and the performance is abysmal. When the key is missing, it takes 5
> minutes. When the key is present, it takes even longer.

How wide are the jsonb values?  It seems likely that most of this
is TOAST overhead [1], ie time to reassemble wide jsonb values
and then split them up again.

As Tumasgiu already mentioned, it'd likely be useful to suppress
updates of rows that don't actually need to change, assuming that
the key appears in a minority of rows.  And an index could help
even more, by avoiding the need to reconstruct wide values to
see if the key appears in them.

Of course, if most of the rows need an update, neither of these
will help and you just gotta live with it.  Possibly reconsider
your approach of using a large JSONB value to contain fields
you need to update individually.  That's never going to be great
for performance.  SQL (or at least Postgres) is incapable of
updating portions of columns efficiently.

regards, tom lane

[1] https://www.postgresql.org/docs/current/storage-toast.html




Corrupt index stopping autovacuum system wide

2019-07-17 Thread Aaron Pelz
Hello,

Over the weekend we noticed that our max transaction IDs have been
continuously increasing - got an alert it passed 1B - and we believe that
no autovacuums were running for a period of about a month by looking at
pg_stat_user_tables. We had not updated any autovac tuning parameters over
that time period and many tables were very much over the threshold for
needing an autovac.

When investigating I located the table with the oldest transaction ID with:

SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age,
pg_size_pretty(pg_table_size(c.oid)) as table_size,
c.relkind,
c.relpersistence
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind in ('r', 't','m')
order by age desc limit 40;

I vacuumed that table manually with `vacuum freeze verbose table_xx` and
got this error:

INFO:  aggressively vacuuming "public.table_xx"
INFO:  scanned index "table_xx_pkey" to remove 168443 row versions
DETAIL:  CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.13 s
INFO:  scanned index "table_xx_col_id" to remove 168443 row versions
DETAIL:  CPU: user: 0.16 s, system: 0.00 s, elapsed: 0.16 s
ERROR:  failed to re-find parent key in index "table_xx_col_idx" for
deletion target page 217

I replaced the index (create a new concurrently, delete the old
concurrently), vacuumed the table, and immediately autovacs started across
the system and our XIDs started falling. To me it looks like a *single*
corrupt index held up autovacuums across our entire server, even other in
other databases on the same server. Am I interpreting this correctly? Would
love guidance on diagnosing this type of thing and strategies for
preventing it.

Thanks,
Aaron


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 8:54 AM Aaron Pelz  wrote:
> To me it looks like a *single* corrupt index held up autovacuums across our 
> entire server, even other in other databases on the same server. Am I 
> interpreting this correctly?

Yes -- that is correct.

What PostgreSQL version are you on? Was this an INCLUDE index on PostgreSQL 11?

> Would love guidance on diagnosing this type of thing and strategies for 
> preventing it.

It's possible that amcheck would have given you an accurate diagnosis
of the problem -- especially if you used bt_index_parent_check():

https://www.postgresql.org/docs/current/amcheck.html
-- 
Peter Geoghegan




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Aaron Pelz
> What PostgreSQL version are you on? Was this an INCLUDE index on
PostgreSQL 11?

On 11, and no it was just a normal btree.

> It's possible that amcheck would have given you an accurate diagnosis
> of the problem -- especially if you used bt_index_parent_check():

I'll look into this, seems helpful. Thanks!

On Wed, Jul 17, 2019 at 12:21 PM Peter Geoghegan  wrote:

> On Wed, Jul 17, 2019 at 8:54 AM Aaron Pelz  wrote:
> > To me it looks like a *single* corrupt index held up autovacuums across
> our entire server, even other in other databases on the same server. Am I
> interpreting this correctly?
>
> Yes -- that is correct.
>
> What PostgreSQL version are you on? Was this an INCLUDE index on
> PostgreSQL 11?
>
> > Would love guidance on diagnosing this type of thing and strategies for
> preventing it.
>
> It's possible that amcheck would have given you an accurate diagnosis
> of the problem -- especially if you used bt_index_parent_check():
>
> https://www.postgresql.org/docs/current/amcheck.html
> --
> Peter Geoghegan
>


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 17, 2019 at 8:54 AM Aaron Pelz  wrote:
>> To me it looks like a *single* corrupt index held up autovacuums across our 
>> entire server, even other in other databases on the same server. Am I 
>> interpreting this correctly?

> Yes -- that is correct.

It looks that way, but how would a broken non-shared index have held up
autovacuuming in other databases?  Maybe, as this one's xmin horizon
got further and further behind, the launcher eventually stopped
considering launching workers into any other databases?  That seems
like a bad thing; it's postponing work that will need to be done
eventually.

regards, tom lane




Change in db size

2019-07-17 Thread Sonam Sharma
I have restored database and the db size of source was around 55gb and
after restore the db size of Target is 47gb.

How to confirm if restore was successful or not ?
The table count is also same.


Re: Change in db size

2019-07-17 Thread David G. Johnston
On Wed, Jul 17, 2019 at 10:03 AM Sonam Sharma  wrote:

> I have restored database and the db size of source was around 55gb and
> after restore the db size of Target is 47gb.
>

You haven't described how you restored the database but measuring size
doesn't work because typically restoring a table causes it to be (nearly?)
bloat free while a long running system will accumulate bloat.


> How to confirm if restore was successful or not ?
>

A lack of error messages is a good primary indicator.


> The table count is also same.
>

Good.

Beyond that you need to decide what amount of internal data validation you
can live with.  It also depends on whether you have expected numbers based
upon the data being restored.

David J.


Re: [External] Change in db size

2019-07-17 Thread Vijaykumar Jain
I guess the restore cleared the bloat from the table.
\dt+
\di+
If you run the above commands from the terminal, you would see diff in
sizes of the. objects.
Also querying pg_stat_all_tables you might see  “dead” tuples in old tables
which would have cleaned up now.
Also if there were any large temp objects or queries using temp files would
have cleared and hence reclaimed disk.
If none of these are true, then I guess experts would pitch in.

On Wed, 17 Jul 2019 at 10:33 PM Sonam Sharma  wrote:

> I have restored database and the db size of source was around 55gb and
> after restore the db size of Target is 47gb.
>
> How to confirm if restore was successful or not ?
> The table count is also same.
>
-- 

Regards,
Vijay


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 9:57 AM Tom Lane  wrote:
> It looks that way, but how would a broken non-shared index have held up
> autovacuuming in other databases?  Maybe, as this one's xmin horizon
> got further and further behind, the launcher eventually stopped
> considering launching workers into any other databases?  That seems
> like a bad thing; it's postponing work that will need to be done
> eventually.

I don't know exactly how the launcher would behave offhand, but it's
clear that not being able to VACUUM one table in one database (because
it has a corrupt index) ultimately risks the availability of every
database in the cluster. Many installations receive little to no
supervision, so it may just be a matter of time there. That is
certainly a bad thing.

-- 
Peter Geoghegan




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 9:21 AM Peter Geoghegan  wrote:
> It's possible that amcheck would have given you an accurate diagnosis
> of the problem -- especially if you used bt_index_parent_check():
>
> https://www.postgresql.org/docs/current/amcheck.html

BTW, be sure to use the 'heapallindexed' option with
bt_index_parent_check() to detect missing downlinks, which is exactly
the problem that VACUUM complained about. Hopefully this probably will
be limited to the single index that you've already REINDEXed. In
theory the same problem could be hiding in other indexes, though I
don't consider that particularly likely.

Note that bt_index_parent_check() requires a lock on tables that
effectively blocks writes, but not reads, so verification may require
planning or coordination. bt_index_check() doesn't have any of these
problems, but also won't detect missing downlinks specifically.

-- 
Peter Geoghegan




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 17, 2019 at 9:57 AM Tom Lane  wrote:
>> It looks that way, but how would a broken non-shared index have held up
>> autovacuuming in other databases?  Maybe, as this one's xmin horizon
>> got further and further behind, the launcher eventually stopped
>> considering launching workers into any other databases?  That seems
>> like a bad thing; it's postponing work that will need to be done
>> eventually.

> I don't know exactly how the launcher would behave offhand, but it's
> clear that not being able to VACUUM one table in one database (because
> it has a corrupt index) ultimately risks the availability of every
> database in the cluster. Many installations receive little to no
> supervision, so it may just be a matter of time there. That is
> certainly a bad thing.

Right, you're eventually going to get to a forced shutdown if vacuum never
succeeds on one table; no question that that's bad.  My concern here is
that if we have blinders on to the extent of only processing that one
table or DB, we're unnecessarily allowing bloat to occur in other tables,
and causing that missed vacuuming work to pile up so that there's more of
it to be done once the breakage is cleared.  If the DBA doesn't notice the
problem until getting into a forced shutdown, that is going to extend his
outage time --- and, in a really bad worst case, maybe make the difference
between being able to recover at all and not.

regards, tom lane




Re: Change in db size

2019-07-17 Thread Sonam Sharma
I took the backup using pg_dump with gzip option and restored it with psql.

On Wed, Jul 17, 2019, 10:39 PM David G. Johnston 
wrote:

> On Wed, Jul 17, 2019 at 10:03 AM Sonam Sharma 
> wrote:
>
>> I have restored database and the db size of source was around 55gb and
>> after restore the db size of Target is 47gb.
>>
>
> You haven't described how you restored the database but measuring size
> doesn't work because typically restoring a table causes it to be (nearly?)
> bloat free while a long running system will accumulate bloat.
>
>
>> How to confirm if restore was successful or not ?
>>
>
> A lack of error messages is a good primary indicator.
>
>
>> The table count is also same.
>>
>
> Good.
>
> Beyond that you need to decide what amount of internal data validation you
> can live with.  It also depends on whether you have expected numbers based
> upon the data being restored.
>
> David J.
>
>


Re: disable and enable trigger all when a foreign keys

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-12, Tom Lane wrote:

> The OP already found the best method to get out of this, which is to
> drop and re-add the FK constraint.  Adding the constraint will force
> a full verification scan, which is what you need here since there's
> no information available about which checks were missed.

Maybe it would make sense to be able to mark the FK as not validated, so
that you can apply ALTER CONSTRAINT ... VALIDATE.  Seems better than
DROP + ADD.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Why no CREATE TEMP MATERIALIZED VIEW ?

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-16, Ivan Voras wrote:

> Hello,
> 
> Out of curiosity, since there's CREATE TEMP VIEW, any particular reason
> there's no CREATE TEMP MATERIALIZED VIEW?

Because it hasn't been implemented.  There is a patch, but it's not
done.  See this thread:
https://postgr.es/m/CAKLmikNoQR4ZNg_wt=-hu-uuxv2cxvdveersmfjr2saopvx...@mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 10:27 AM Tom Lane  wrote:
> Right, you're eventually going to get to a forced shutdown if vacuum never
> succeeds on one table; no question that that's bad.

It occurs to me that we use operator class/insertion scankey
comparisons within page deletion, to relocate a leaf page that looks
like a candidate for deletion. Despite this, README.hot claims:

"Standard vacuuming scans the indexes to ensure all such index entries
are removed, amortizing the index scan cost across as many dead tuples
as possible; this approach does not scale down well to the case of
reclaiming just a few tuples.  In principle one could recompute the
index keys and do standard index searches to find the index entries,
but this is risky in the presence of possibly-buggy user-defined
functions in functional indexes.  An allegedly immutable function that
in fact is not immutable might prevent us from re-finding an index
entry"

That probably wasn't the problem in Aaron's case, but it is worth
considering as a possibility.

> My concern here is
> that if we have blinders on to the extent of only processing that one
> table or DB, we're unnecessarily allowing bloat to occur in other tables,
> and causing that missed vacuuming work to pile up so that there's more of
> it to be done once the breakage is cleared.  If the DBA doesn't notice the
> problem until getting into a forced shutdown, that is going to extend his
> outage time --- and, in a really bad worst case, maybe make the difference
> between being able to recover at all and not.

The comment about "...any db at risk of Xid wraparound..." within
do_start_worker() hints at such a problem.

Maybe nbtree VACUUM should do something more aggressive than give up
when there is a "failed to re-find parent key" or similar condition.
Perhaps it would make more sense to make the index inactive (for some
value of "inactive") instead of just complaining. That might be the
least worst option, all things considered.

--
Peter Geoghegan




Looking for Postgres upgrade Metrix

2019-07-17 Thread Perumal Raj
Hi Team,

Do we have any reference link which explain various  upgrade path ( Direct
/ indirect) by using pg_upgrade or latest utility.

I hope pg_dump can be used from any lower version to Higher version. Please
correct me if I am wrong.

Thanks,
Raj


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-17, Peter Geoghegan wrote:

> Maybe nbtree VACUUM should do something more aggressive than give up
> when there is a "failed to re-find parent key" or similar condition.
> Perhaps it would make more sense to make the index inactive (for some
> value of "inactive") instead of just complaining. That might be the
> least worst option, all things considered.

Maybe we can mark an index as unvacuumable in some way?  As far as I
understand, all queries using that index work, as do index updates; it's
just vacuuming that fails.  If we mark the index as unvacuumable, then
vacuum just skips it (and does not run phase 3 for that table), and
things can proceed; the table's age can still be advanced.  Obviously
it'll result in more bloat than in normal condition, but it shouldn't
cause the whole cluster to go down.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Looking for Postgres upgrade Metrix

2019-07-17 Thread David G. Johnston
On Wed, Jul 17, 2019 at 11:16 AM Perumal Raj  wrote:

> Hi Team,
>
> Do we have any reference link which explain various  upgrade path ( Direct
> / indirect) by using pg_upgrade or latest utility.
>
> I hope pg_dump can be used from any lower version to Higher version.
> Please correct me if I am wrong.
>

>From the v11 documentation:

Because pg_dump is used to transfer data to newer versions of PostgreSQL,
the output of pg_dump can be expected to load into PostgreSQL server
versions newer than pg_dump's version. pg_dump can also dump from
PostgreSQL servers older than its own version. (Currently, servers back to
version 8.0 are supported.) However, pg_dump cannot dump from PostgreSQL
servers newer than its own major version; it will refuse to even try,
rather than risk making an invalid dump. Also, it is not guaranteed that
pg_dump's output can be loaded into a server of an older major version —
not even if the dump was taken from a server of that version. Loading a
dump file into an older server may require manual editing of the dump file
to remove syntax not understood by the older server. Use of the
--quote-all-identifiers option is recommended in cross-version cases, as it
can prevent problems arising from varying reserved-word lists in different
PostgreSQL versions.

David J.


Re: Looking for Postgres upgrade Metrix

2019-07-17 Thread Adrian Klaver

On 7/17/19 11:15 AM, Perumal Raj wrote:

Hi Team,

Do we have any reference link which explain various  upgrade path ( 
Direct / indirect) by using pg_upgrade or latest utility.


https://www.postgresql.org/docs/11/backup.html

What exactly are you trying to do?



I hope pg_dump can be used from any lower version to Higher version. 


As long as you use the newer pg_dump to dump the older version.


Please correct me if I am wrong.





Thanks,
Raj




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jul-17, Peter Geoghegan wrote:
>> Maybe nbtree VACUUM should do something more aggressive than give up
>> when there is a "failed to re-find parent key" or similar condition.
>> Perhaps it would make more sense to make the index inactive (for some
>> value of "inactive") instead of just complaining. That might be the
>> least worst option, all things considered.

> Maybe we can mark an index as unvacuumable in some way?  As far as I
> understand, all queries using that index work, as do index updates; it's
> just vacuuming that fails.  If we mark the index as unvacuumable, then
> vacuum just skips it (and does not run phase 3 for that table), and
> things can proceed; the table's age can still be advanced.  Obviously
> it'll result in more bloat than in normal condition, but it shouldn't
> cause the whole cluster to go down.

If an index is corrupt enough to break vacuum, I think it takes a rather
large leap of faith to believe that it's not going to cause problems for
inserts or searches.  I'd go with just marking the index broken and
insisting that it be REINDEX'd before we touch it again.

Not sure how we make that happen automatically though, or if we even
should.
(a) once the transaction's failed, you can't go making catalog updates; 
(b) even when you know the transaction's failed, blaming it on a
particular index seems a bit chancy; 
(c) automatically disabling constraint indexes seems less than desirable.

regards, tom lane




Re: Looking for Postgres upgrade Metrix

2019-07-17 Thread Perumal Raj
Thanks Adrian, David,

Basically , i want to upgrade few 9.X/8.X version DBs  to some stable
version ( 10.X / 11.X ), At the same time with less down time.
So want to understand whether direct upgrade possible or not between major
releases .

Thanks,

On Wed, Jul 17, 2019 at 11:24 AM Adrian Klaver 
wrote:

> On 7/17/19 11:15 AM, Perumal Raj wrote:
> > Hi Team,
> >
> > Do we have any reference link which explain various  upgrade path (
> > Direct / indirect) by using pg_upgrade or latest utility.
>
> https://www.postgresql.org/docs/11/backup.html
>
> What exactly are you trying to do?
>
> >
> > I hope pg_dump can be used from any lower version to Higher version.
>
> As long as you use the newer pg_dump to dump the older version.
>
> > Please correct me if I am wrong.
>
>
> >
> > Thanks,
> > Raj
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-17, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Jul-17, Peter Geoghegan wrote:
> >> Maybe nbtree VACUUM should do something more aggressive than give up
> >> when there is a "failed to re-find parent key" or similar condition.
> >> Perhaps it would make more sense to make the index inactive (for some
> >> value of "inactive") instead of just complaining. That might be the
> >> least worst option, all things considered.
> 
> > Maybe we can mark an index as unvacuumable in some way?  As far as I
> > understand, all queries using that index work, as do index updates; it's
> > just vacuuming that fails.  If we mark the index as unvacuumable, then
> > vacuum just skips it (and does not run phase 3 for that table), and
> > things can proceed; the table's age can still be advanced.  Obviously
> > it'll result in more bloat than in normal condition, but it shouldn't
> > cause the whole cluster to go down.
> 
> If an index is corrupt enough to break vacuum, I think it takes a rather
> large leap of faith to believe that it's not going to cause problems for
> inserts or searches.

Maybe, but it's what happened in the reported case.  (Note Aaron was
careful to do the index replacement concurrently -- he wouldn't have
done that if the table wasn't in active use.)

> I'd go with just marking the index broken and
> insisting that it be REINDEX'd before we touch it again.

This might make things worse operationally, though.  If searches aren't
failing but vacuum is, we'd break a production system that currently
works.

> (a) once the transaction's failed, you can't go making catalog updates; 

Maybe we can defer the actual update to some other transaction -- say
register an autovacuum work-item, which can be executed separately.

> (b) even when you know the transaction's failed, blaming it on a
> particular index seems a bit chancy; 

Well, vacuum knows what index is being processed.  Maybe you're thinking
that autovac can get an out-of-memory condition or something like that;
perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED
condition is reported (and make sure all such conditions do that.  As
far as I remember we have a patch for this particular error to be
reported as such.)

> (c) automatically disabling constraint indexes seems less than desirable.

Disabling them for writes, yeah.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Looking for Postgres upgrade Metrix

2019-07-17 Thread David G. Johnston
On Wed, Jul 17, 2019 at 11:39 AM Perumal Raj  wrote:

> Thanks Adrian, David,
>
> Basically , i want to upgrade few 9.X/8.X version DBs  to some stable
> version ( 10.X / 11.X ), At the same time with less down time.
> So want to understand whether direct upgrade possible or not between major
> releases .
>

>From the pg_upgrade documentation:

"pg_upgrade supports upgrades from 8.4.X and later to the current major
release of PostgreSQL, including snapshot and beta releases."

You demonstrated knowledge of the two relevant programs that can be used to
upgrade and their documentation explicitly states their minimum version
limit so I'm not understanding why there is a question.  pg_upgrade is the
better option for upgrading.

David J.


Resolved: Looking for Postgres upgrade Metrix

2019-07-17 Thread Perumal Raj
Ok, thanks for the clarification.

On Wed, Jul 17, 2019 at 11:46 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Jul 17, 2019 at 11:39 AM Perumal Raj  wrote:
>
>> Thanks Adrian, David,
>>
>> Basically , i want to upgrade few 9.X/8.X version DBs  to some stable
>> version ( 10.X / 11.X ), At the same time with less down time.
>> So want to understand whether direct upgrade possible or not between
>> major releases .
>>
>
> From the pg_upgrade documentation:
>
> "pg_upgrade supports upgrades from 8.4.X and later to the current major
> release of PostgreSQL, including snapshot and beta releases."
>
> You demonstrated knowledge of the two relevant programs that can be used
> to upgrade and their documentation explicitly states their minimum version
> limit so I'm not understanding why there is a question.  pg_upgrade is the
> better option for upgrading.
>
> David J.
>
>


Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 11:43 AM Alvaro Herrera
 wrote:
> This might make things worse operationally, though.  If searches aren't
> failing but vacuum is, we'd break a production system that currently
> works.

If searches aren't failing and VACUUM works, then that's probably down
to dumb luck. The user's luck could change at any time (actually, it's
quite possible that the index is already giving wrong answers without
anybody realizing). That's not always true, of course -- you could
have an OOM condition in VACUUM, where it really does make sense to
retry. But it should be true for the category of errors where we
behave more aggressively than just giving up, such as "failed to
re-find parent key" error Aaron noticed.

> Well, vacuum knows what index is being processed.  Maybe you're thinking
> that autovac can get an out-of-memory condition or something like that;
> perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED
> condition is reported (and make sure all such conditions do that.  As
> far as I remember we have a patch for this particular error to be
> reported as such.)

I don't think that it would be that hard to identify errors that
nbtree VACUUM could throw that clearly indicate corruption, without
any hope of the problem self-correcting without the DBA running a
REINDEX. There will be a small amount of gray area, perhaps, but
probably not enough to matter.

> > (c) automatically disabling constraint indexes seems less than desirable.
>
> Disabling them for writes, yeah.

I think that it's fair to say that all bets are off once you see the
"failed to re-find parent key" error, or any other such error that
indicates corruption. Admittedly it isn't 100% clear that disabling
constraint enforcement to unblock autovacuum for the whole cluster is
better than any available alternative; it's really hard to reason
about things when we already know that the database has corruption.

I think that it's okay that almost anything can break when somebody
creates an index on a non-immutable expression (including VACUUM),
provided that all the problems only affect the table with the broken
index. OTOH, taking down the entire Postgres cluster as an indirect
consequence of one person's ill-considered CREATE INDEX really sucks.
That distinction seems important to me.

-- 
Peter Geoghegan




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 10:27 AM Peter Geoghegan  wrote:
> > It's possible that amcheck would have given you an accurate diagnosis
> > of the problem -- especially if you used bt_index_parent_check():
> >
> > https://www.postgresql.org/docs/current/amcheck.html
>
> BTW, be sure to use the 'heapallindexed' option with
> bt_index_parent_check() to detect missing downlinks, which is exactly
> the problem that VACUUM complained about.

Can you tell us more about this index? Can you share its definition
(i.e. what does \d show in psql)?

Is it an expression index, or a partial index? A composite? What
datatypes are indexed?

Thanks
--
Peter Geoghegan




Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Andres Freund
Hi,

On 2019-07-17 13:27:23 -0400, Tom Lane wrote:
> My concern here is that if we have blinders on to the extent of only
> processing that one table or DB, we're unnecessarily allowing bloat to
> occur in other tables, and causing that missed vacuuming work to pile
> up so that there's more of it to be done once the breakage is cleared.

That a pretty common problem in the real world, with or without problems
like corrupted indexes. Autovacuum's scheduling is just too simplistic
to avoid that.

Common problems:

- If one table in one database has an xmin older than
  autovacuum_freeze_max_age, the autovacuum launcher continuously throws
  more workers at that database. Even though there's a lot of work to be
  done in other databases.

  That's because do_start_worker() ignores the "adl_next_worker"
  mechanism, and *always* starts more workers for the database with the
  oldest datfrozenxid (same with datminmxid), and because we explicitly
  don't start workers for other databases ("/* ignore not-at-risk DBs
  */").

  That's especially bad if that database has a longrunning transaction
  preventing cleanup from happening - the vacuuming will not be able to
  advance the horizon, and all others are starved.


- If there's a table "early in pg_class", and vacuum on it failes, it
  will prevent vacuuming other tables. It's pretty common for vacuum to
  fail often for a table, e.g. because it's sometimes exclusively locked,
  which then causes autovacuum to kill itself.  There's absolutely no
  mechanism for autovacuum workers to skip over that table for a while,
  leading to all other tables in the database not being vacuumed, unless
  there happens to be second worker in the database, while the first
  vacuum hasn't failed.

  This obviously also applies to the corrupted index case.

  The 'kill itself' behaviour is exascerbated by lazy_truncate_heap()'s
  exclusive lock - which will obviously trigger other backend to send
  cancellation requests. There's unfortunately a number of places where
  that leads us to just throw all the work done away, and not update
  pg_class.relfrozenxid/datfrozenxid


- Anti-wraparound vacuums are more impactful (don't cancel themselves
  upon lock conflicts, cause more IO, wait for cleanup locks), often
  emit scary messages ("oldest xmin is far in the past").  But we don't
  have *any* mechanism that avoids them for very common scenarios.

  E.g. for insert-mostly workloads, there'll never be enough dead tuples
  to cause a vacuum to happen before autovacuum_freeze_max_age is
  reached. That prevents vacuum_freeze_table_age from pre-empting the
  need to do an anti-wraparound vacuum, by increasing the xid horizon.

  We literally call anti-wraparound autovacuums "emergency" in the code,
  yet they're an almost inevitablepart of running postgres.


- There's no meaningful in-database prioritization. Which means that
  autovacuum workers might happily vacuum the table just a bit over the
  thresholds, even though there's much worse tables around.

  Especially on a busy and large databases that can lead to
  anti-wraparound started launchers effectively never getting to
  vacuuming tables above autovacuum_freeze_max_age, because tables
  earlier in pg_class are modified heavily enough that they have dead
  tuples above the thresholds by the time vacuum finishes.  To get to
  the anti-wraparound vacuum needing table, a single launcher needs to
  go through all tables preceding the table in pg_class that need
  vacuuming (only skipping over ones that are concurrently vacuumed by
  somebody else, but not ones that have *recently* been vacuumed).

  I kinda forgot how bad this one was until looking at the code again.

And there are plenty more.


My impression is that these are really hard to fix unless we develop a
new scheduling approach. And that scheduling approach probably needs to
be more stateful than the current code.

IMO these are the main concerns for how work needs to be distributed:

- work in different databases needs to be scheduled in a smarter way, in
  particular anti-wraparound cannot simply cause only the database with
  the to oldest datfrozenxid to be vacuumed until the wraparound
  "danger" is over (especially not if there's other database needing to
  be anti-wrap vacuumed)

- tables within a database need to be prioritized in a smarter way, so
  databases with a lot of bloat get vacuumed before ones with a lot less
  bloat, and similarly tables with the oldest horizon need to be
  vacuumed before ones with newer horizons, even if all of the tables
  are above the autovacuum thresholds.

- tables need to be prioritized across databases, to avoid problems like
  one currently vacuumed table causing unnecessary anti-wraparound
  workers to be launched for a database where they can't perform any
  work.

- there needs to be a separate rung between normal autovacuums and
  anti-wraparound vacuums. The inbetween level does 'aggressive'
  vacuuming (so it free

pg_stat_progress_vacuum comes up empty ...?

2019-07-17 Thread Michael Harris
Hello,

We have a database cluster which recently got very close to XID Wraparound. To 
get
it back under control I've been running a lot of aggressive manual vacuums.

However, I have noticed a few anomolies. When I try to check the status of 
vacuum commands:

qtodb_pmxtr=# select * from  pg_stat_progress_vacuum;
 pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | 
heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-+---+-+---+---+-+---+++-+-
(0 rows)

Yet there definitely are plenty running:

qtodb_pmxtr=# select pid, state, current_timestamp-query_start as 
duration,query from pg_stat_activity where datname='qtodb_pmxtr' and 
query~'VACUUM' ;
  pid  | state  |duration | 
  query
---++-+---
 40615 | active | 13:46:35.081203 | autovacuum: VACUUM 
qn.mtrds_cnestmeas_oo_18032
 40617 | active | 00:46:35.270232 | autovacuum: VACUUM 
qn.mtrds_cantu100_oo_18046
 40622 | active | 00:00:04.55167  | autovacuum: VACUUM 
qn.mtrds_cbeekops_on_17684 (to prevent wraparound)
 25685 | active | 00:00:04.568989 | VACUUM FREEZE pg_toast.pg_toast_228072029;
 25686 | active | 00:00:02.716111 | VACUUM FREEZE pg_toast.pg_toast_228072943;
 25687 | active | 00:00:03.788131 | VACUUM FREEZE pg_toast.pg_toast_228069006;
 25688 | active | 00:00:02.531885 | VACUUM FREEZE pg_toast.pg_toast_228067023;
 25689 | active | 00:00:02.098389 | VACUUM FREEZE pg_toast.pg_toast_228071980;
 25690 | active | 00:00:00.621036 | VACUUM FREEZE pg_toast.pg_toast_228071852;
 25691 | active | 00:00:11.424717 | VACUUM FREEZE pg_toast.pg_toast_228069597;
 25692 | active | 00:00:03.359416 | VACUUM FREEZE pg_toast.pg_toast_228073892;
 25693 | active | 00:00:04.569248 | VACUUM FREEZE pg_toast.pg_toast_228068022;
 25694 | active | 00:00:20.151786 | VACUUM FREEZE pg_toast.pg_toast_228068878;
 25695 | active | 00:00:00.517688 | VACUUM FREEZE pg_toast.pg_toast_228068478;
 25696 | active | 00:00:23.746402 | VACUUM FREEZE pg_toast.pg_toast_228067431;
 25697 | active | 00:00:10.759025 | VACUUM FREEZE pg_toast.pg_toast_228072997;
 25698 | active | 00:00:14.281798 | VACUUM FREEZE pg_toast.pg_toast_228074613;
 25699 | active | 00:00:05.631052 | VACUUM FREEZE pg_toast.pg_toast_228074247;
 25700 | active | 00:00:00.056749 | VACUUM FREEZE pg_toast.pg_toast_228071681;
 28008 | active | 00:00:00| select pid, state, 
current_timestamp-query_start as duration,query from pg_stat_activity where 
datname='qtodb_pmxtr' and query~'VACUUM' ;
(20 rows)

Why don't any of these (manual OR auto) show up in the pg_stat_progress_vacuum?

Another concern: the normal autovacuums seem to be stalling. The table 
qn.mtrds_cnestmeas_oo_18032 should surely not take more than 13 hours to
vacuum, since it is only 160KB in size ...!

qtodb_pmxtr=# select 
pg_size_pretty(pg_relation_size('qn.mtrds_cnestmeas_oo_18032'::regclass));
 pg_size_pretty

 160 kB
(1 row)

We have autovacuum_cost_delay set to 0.

I also don't understand why only one autovac worker is working on the
wraparound issue, as there are thousands of tables with oldest xid > 
autovacuum_freeze_max_age.
I would have thought it would be prioritizing those.

I'm worried that something is wrong with autovacuum on this database, which 
might
be responsible for it getting into this state to begin with. Other similar 
databases we
have, running the same application and with similar configuration, are managing 
to
keep up with the xid freezing nicely.

The database was on 9.6, but was recently upgraded to 11.4.

Any advice welcome!

Cheers
Mike.






Re: Change in db size

2019-07-17 Thread Luca Ferrari
On Wed, Jul 17, 2019 at 7:33 PM Sonam Sharma  wrote:
>
> I took the backup using pg_dump with gzip option and restored it with psql.

It does not change the way you backed up, chances are when you
restored it the database cropped table and index bloating as already
mentioned.
The other, remote chance, is that you did not backed up all the
objects, so in this case you should show us your backup command line
(and restore one too), and also the output of commands like \l+, \dt+,
\di+ and differences you have found.

Luca