PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.

2019-11-18 Thread Nicolas Lutic
d 574; 1 xacts: 574|
|
|
|rmgr: XLOG    len (rec/tot):    106/   106, tx:  0, lsn:
0/03000768, prev 0/03000730, desc: CHECKPOINT_ONLINE redo 0/3000730; tli
1; prev tli 1; fpw true; xid 0:575; oid 24576; multi 1; offset 0; oldest
xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 574; online|
|
|
|rmgr: Database    len (rec/tot): 34/    34, tx:    574, lsn:
0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663|
|
|
|rmgr: Transaction len (rec/tot): 66/    66, tx:    574, lsn:
0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426
CET; inval msgs: catcache 21; sync|
|
|


We notice that the following log

|rmgr: Database    len (rec/tot): 34/    34, tx:    574, lsn:
0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663|

is executed between the last commit that we are interested inand the
next record with a timestamp

|rmgr: Transaction len (rec/tot): 66/    66, tx:    574, lsn:
0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426
CET; inval msgs: catcache 21; sync|
|
|

We understand that the drop database command is not transactional but
the drop dir is attached to the xact whose xid has a commit with
timestime out of recovery_target_time bound.
On the other hand, DBA role is to determine which at which xact recovery
should stop and define recovery_target_xid rather than
recovery_target_time.
Humans are prone to use natural things such as time to define "when" to
stop or start things.
We know that this rarely happens in production, because you can't drop a
database if users are still connected. But with the new force drop
database option, it might be a reasonable choice to improve the
situation with that recovery_target_time directive.

It turns out there are two different choices we can make : 


  *     Change recovery behaviour in that case to prevent all xact
operation to perform until COMMIT timestamp is checked against
recovery_time bound (but it seems to be difficult as
state 
https://www.postgresql.org/message-id/flat/20141125160629.GC21475%40msg.df7cb.dewhich
also identifies the problem and tries to give some solutions.  Maybe
another way, as a trivial guess (all apologises) is to buffer
immediate xacts until we have the commit for each and apply the
whole buffer xact once the timestamp known (and checked agains
recovery_target_time value);

  *     The other way to improve this is to update PostgreSQL
documentation  by specifying that recovery_target_time cannot be
used in this case.There should be multiple places where it can be
stated. The best one (if only one) seems to be in 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=f83770350eda5625179526300c652f23ff29c9fe;hb=HEAD#l3400
 


We are willing to help on this case either with code patching or
documentation improvement.

Best regards,

-- 
LOXODATA
Nicolas Lutic




Re: PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.

2019-11-19 Thread Nicolas Lutic


On 11/19/19 1:40 AM, Craig Ringer wrote:
> On Mon, 18 Nov 2019 at 18:48, Nicolas Lutic  <mailto:n.lu...@loxodata.com>> wrote:
> 
> Dear Hackers,
> 
> After a drop database
> 
> 
> with FORCE?
No, we tested with PostgreSQL v 11 and we don't have this option.
>  
> 
> , he tried to recover the data on the last inserted transaction by
> using the recovery_target_time.
> The issue is the database is present in the system catalog but the
> directory was still deleted.
> Here the technical information of the database
> version 11
> default  postgresql.conf except for this options
>     wal_level = replica
>     archive_mode = on
>     archive_command = 'cp %p /tmp/wal_archive/%f '
>     log_statement = 'all'
>     log_min_messages = debug5
> 
>   
> The following method was used 
> 
>   * create cluster
> 
>   * create database
> 
>   * create 1 table 
> 
>   * create 1 index on 1 column
> 
>   * insert 1 rows
> 
>   * backup with pg_base_backup
> 
>   * insert 2 rows
> 
> autocommit? 

Yes, I forgot to mention it.

> 
>   * drop database
> 
> force?
>  
> 
>   *     Change recovery behaviour in that case to prevent all xact
> operation to perform until COMMIT timestamp is checked against
> recovery_time bound (but it seems to be difficult as
> state 
> https://www.postgresql.org/message-id/flat/20141125160629.GC21475%40msg.df7cb.dewhich
> also identifies the problem and tries to give some solutions. 
> Maybe another way, as a trivial guess (all apologises) is to
> buffer immediate xacts until we have the commit for each and
> apply the whole buffer xact once the timestamp known (and
> checked agains recovery_target_time value);
> 
>   *     The other way to improve this is to update PostgreSQL
> documentation  by specifying that recovery_target_time cannot be
> used in this case.There should be multiple places where it can
> be stated. The best one (if only one) seems to be in 
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=
> 
> <https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=f83770350eda5625179526300c652f23ff29c9fe;hb=HEAD#l3400>
> 
> 
> If this only happens when a DB is dropped under load with force, I lean
> toward just documenting it as a corner case.

This can happen in the case of a non-transactional instruction, DROP
DATABASE (with or without FORCE) is one case but there may be other cases ?

The documentation modification have to mention this case and list the
other most likely operations.

An idea, without insight knowledge of the code, in case of
recovery_target_time (only), would be to move forward each record for an
xact.

Each record that is «timestamped» can be applied but once we encounter a
non timestamped record we could buffer the following records for any
xaxts until a timestamped commit/rollback for the transaction where that
non transactionnal op appearsin. Once the commit/rollback records are
found, there's two options :
1) the commit/rollback timestamp is inside the "replay" bound, then the
whole buffer can be applied
2) the commit/rollback timestamp is beyond the upper time bound for
"replay", then  the whole buffer for that transaction could be canceled.
This can only be done on DROP DATABASE "DELETE" operation ?
Maybe, this will lead to skewed pages and this is a wrong way to do such
a thing.

Another assumption is that "DROP DATABASE" sequence can be changed for
this operation to perform correctly.

We are aware that this part is tricky and will have little effects on
normal operations, as best practices are to use xid_target or lsn_target.


> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise

Best regards

-- 
LOXODATA https://www.loxodata.com/
Consulting - Training - Support
Nicolas Lutic
Consultant trainer





planner chooses incremental but not the best one

2023-12-12 Thread Nicolas Lutic
 processing_state = 
'PENDING_PROCESSING'::text


Please find attached the 3 plans

explain_analyse_incremental_off.txt with enable_incremental_sort to off
explain_analyse_incremental_on.txt with enable_incremental_sort to on
explain_analyse_incremental_on_limit5000 with enable_incremental_sort to 
on but with increase the limit to 5000, in this case plan choose don't 
use `Incremental Sort`


The point that I don't understand in the plan (incremental_sort to on) 
is the top level one, the limit cost doesn't seem right.


Limit  (cost=324.05..16073.82 rows=50 width=44) (actual 
time=1663688.290..1663696.151 rows=50 loops=1)

   Buffers: shared hit=114672881 read=5725197 dirtied=38564 written=24394
   I/O Timings: shared/local read=1481378.069 write=313.574
   ->  Incremental Sort  (cost=324.05..27838050.13 rows=88375 width=44) 
(actual time=1663688.289..1663696.144 rows=50 loops=1)



Have you a explaination on the behaviour ?

Best regards


--
Nicolas Lutic

   QUERY 
PLAN

-
 Limit  (cost=262056.35..262056.47 rows=50 width=44) (actual 
time=190.836..190.859 rows=50 loops=1)
   Buffers: shared hit=165374 dirtied=1
   ->  Sort  (cost=262056.35..262277.28 rows=88375 width=44) (actual 
time=190.835..190.848 rows=50 loops=1)
 Sort Key: inputdocum0_.created_at, inputdocum0_.reception_id, 
inputdocum0_.reception_order
 Sort Method: top-N heapsort  Memory: 31kB
 Buffers: shared hit=165374 dirtied=1
 ->  Append  (cost=0.29..259120.59 rows=88375 width=44) (actual 
time=5.260..190.514 rows=1078 loops=1)
   Buffers: shared hit=165374 dirtied=1
   ->  Index Scan using 
input_document_00_indexation_activity_id_indexation_domain_idx1 on 
input_document_00 inputdocum0__1  (cost=0.29..12844.44 rows=4386 width=44) 
(actual time=5.259..9.211 rows=56 loops=1)
 Index Cond: (indexation_domain_id = 
'2d29daf6-e151-479a-a52a-78b08bb3009d'::uuid)
 Filter: ((locked_at IS NULL) AND (locked_by_app IS NULL) 
AND (locked_by_user IS NULL) AND (lock_time_out IS NULL) AND active_content AND 
(archiving_state <> 'DESTROYED'::text) AND (indexation_subsidiary_id = 
'9f9df402-f70b-40d9-b283-a3c35232469a'::uuid) AND (creation_state = 
'READY'::text))
 Rows Removed by Filter: 6931
 Buffers: shared hit=7989
   ->  Index Scan using 
input_document_01_indexation_activity_id_indexation_domain_idx1 on 
input_document_01 inputdocum0__2  (cost=0.29..12873.65 rows=4352 width=44) 
(actual time=5.529..9.709 rows=53 loops=1)
 Index Cond: (indexation_domain_id = 
'2d29daf6-e151-479a-a52a-78b08bb3009d'::uuid)
 Filter: ((locked_at IS NULL) AND (locked_by_app IS NULL) 
AND (locked_by_user IS NULL) AND (lock_time_out IS NULL) AND active_content AND 
(archiving_state <> 'DESTROYED'::text) AND (indexation_subsidiary_id = 
'9f9df402-f70b-40d9-b283-a3c35232469a'::uuid) AND (creation_state = 
'READY'::text))
 Rows Removed by Filter: 7025
 Buffers: shared hit=8372
   ->  Index Scan using 
input_document_02_indexation_activity_id_indexation_domain_idx1 on 
input_document_02 inputdocum0__3  (cost=0.29..12761.21 rows=4380 width=44) 
(actual time=6.005..9.832 rows=59 loops=1)
 Index Cond: (indexation_domain_id = 
'2d29daf6-e151-479a-a52a-78b08bb3009d'::uuid)
 Filter: ((locked_at IS NULL) AND (locked_by_app IS NULL) 
AND (locked_by_user IS NULL) AND (lock_time_out IS NULL) AND active_content AND 
(archiving_state <> 'DESTROYED'::text) AND (indexation_subsidiary_id = 
'9f9df402-f70b-40d9-b283-a3c35232469a'::uuid) AND (creation_state = 
'READY'::text))
 Rows Removed by Filter: 7053
 Buffers: shared hit=8275
   ->  Index Scan using 
input_document_03_indexation_activity_id_indexation_domain_idx1 on 
input_document_03 inputdocum0__4  (cost=0.29..13019.24 rows=4467 width=44) 
(actual time=5.382..9.862 rows=60 loops=1)
 Index Cond: (indexation_domain_id = 
'2d29daf6-e151-479a-a52a-78b08bb3009d'::uuid)
 Filter: ((locked_