Postgres complete recovery. Hello experts,
i am working on postgres backup and recovery, with pg_basebackup utility. for some reason i was not able to perform complete recovery, recovery is completed till the available wal log not not able to perform complete recovery committed transaction on wal logs. any one performed complete database recovery ? if yes, kindly share the procedure. here is the test case i have done wal log setting on postgres =====>>> enable archive log mode for a postgres cluster wal_level = replica # minimal, replica, or logical archive_mode = off # enables archiving; off, on, or always archive_command = 'cp %p /pgdata/archive/5444/%f' # command to use to archive a logfile segment ======>>>> verify archive log setting [postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444 postgres postgres=# show archive_mode; archive_mode -------------- on (1 row) postgres=# show archive_command postgres-# ; archive_command ------------------------------- cp %p /pgdata/archive/5444/%f (1 row) postgres=# show wal_level; wal_level ----------- replica (1 row) ===== >>> force manual swtich postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/13A7EBC8 (1 row) postgres=# \! ls -l /pgdata/archive/5444/ total 32768 -rw-------. 1 postgres postgres 16777216 Mar 4 04:53 000000010000000000000013 -rw-------. 1 postgres postgres 16777216 Mar 4 04:54 000000010000000000000014 postgres=# to perform hot backup, with low level api 1. login to postgres server with psql 2. enable start backup (select pg_start_backup('hotbackup_20190304'); 3. use os command to copy data, tablespace, wal location to backup media 4. stop backup (select pg_stop_backup('hotbackup_20190304');) pg_basebackup tool . makes binary copy of database cluster files it will automatically put database in backup mode and ends once backup is completed. it uses stream protocol below parameter has to be st for pg_basebackup wal_level = replica # minimal, replica, or logical archive_mode = off # enables archiving; off, on, or always archive_command = 'cp %p /pgdata/archive/5444/%f' # command to use to archive a logfile segment #max_wal_senders = 10 # max number of walsender processes modify hba_file.conf # Allow replication connections from localhost, by a user with the # replication privilege. local replication postgres md5 host replication postgres 127.0.0.1/32 md5 host replication postgres ::1/128 md5 #Allow remote replication connections from localhost, by a user with the host replication <username> <ip adderess>/32 md5 restart database pg_ctl -D $PWD stop run pg_Basebackup [postgres@pgsql-lnx1 online]$ pg_basebackup -h localhost -p 5444 -D /pgbkup/backup/fsbackup/online -X stream -v > /tmp/hotbackup.log 2>&1 Password: note -X is used to copy latest wal segment (16mb) to self start datababse after restore and recovery view the file /tmp/hotbackup.log file to see the backup information. [postgres@pgsql-lnx1 online]$ cat /tmp/hotbackup.log pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/19000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: write-ahead log end point: 0/19000130 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: base backup completed [postgres@pgsql-lnx1 online]$ [postgres@pgsql-lnx1 online]$ pwd /pgbkup/backup/fsbackup/online [postgres@pgsql-lnx1 online]$ cat backup_label START WAL LOCATION: 0/19000028 (file 000000010000000000000019) CHECKPOINT LOCATION: 0/19000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2019-04-04 05:44:46 CDT LABEL: pg_basebackup base backup [postgres@pgsql-lnx1 online]$ ====== >>>>>perform recovery: create table test and insert few rows ltuser=> create table test (eno int); CREATE TABLE ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> select count(*) from test; count ------- 10000 (1 row) ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> select count(*) from test; count ------- 20000 (1 row) ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> commit; WARNING: there is no transaction in progress COMMIT ltuser=> select count(*) from test; count ------- 40000 (1 row) ltuser=> \! ls -lart /pgdata/archive/5444/ total 114700 drwxrwxr-x. 4 postgres postgres 30 Mar 4 04:41 .. -rw-------. 1 postgres postgres 16777216 Mar 4 04:53 000000010000000000000013 -rw-------. 1 postgres postgres 16777216 Mar 4 04:54 000000010000000000000014 -rw-------. 1 postgres postgres 16777216 Mar 4 05:39 000000010000000000000015 -rw-------. 1 postgres postgres 16777216 Mar 4 05:43 000000010000000000000016 -rw-------. 1 postgres postgres 305 Mar 4 05:43 000000010000000000000017.00000028.backup -rw-------. 1 postgres postgres 16777216 Mar 4 05:43 000000010000000000000017 -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000018 -rw-------. 1 postgres postgres 305 Mar 4 05:44 000000010000000000000019.00000028.backup drwxrwxr-x. 2 postgres postgres 4096 Mar 4 05:44 . -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000019 ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal total 213004 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001B -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001C -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001D -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001E -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001F -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000020 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000021 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000022 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000024 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000023 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000025 drwx------. 19 postgres postgres 4096 Mar 4 05:39 .. -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000019 -rw-------. 1 postgres postgres 305 Mar 4 05:44 000000010000000000000019.00000028.backup drwx------. 3 postgres postgres 4096 Mar 4 05:59 . drwx------. 2 postgres postgres 96 Mar 4 05:59 archive_status -rw-------. 1 postgres postgres 16777216 Mar 4 06:24 00000001000000000000001A ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> insert into test values (generate_series(1,10000)); INSERT 0 10000 ltuser=> select count(*) from test; count ------- 70000 (1 row) ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal total 213004 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001B -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001C -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001D -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001E -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 00000001000000000000001F -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000020 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000021 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000022 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000024 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000023 -rw-------. 1 postgres postgres 16777216 Mar 4 03:08 000000010000000000000025 drwx------. 19 postgres postgres 4096 Mar 4 05:39 .. -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000019 -rw-------. 1 postgres postgres 305 Mar 4 05:44 000000010000000000000019.00000028.backup drwx------. 3 postgres postgres 4096 Mar 4 05:59 . drwx------. 2 postgres postgres 96 Mar 4 05:59 archive_status -rw-------. 1 postgres postgres 16777216 Mar 4 06:25 00000001000000000000001A >>>>> file has latest table test table , log swith not completed so far ltuser=> \! ls -lart /pgdata/archive/5444/ total 114700 drwxrwxr-x. 4 postgres postgres 30 Mar 4 04:41 .. -rw-------. 1 postgres postgres 16777216 Mar 4 04:53 000000010000000000000013 -rw-------. 1 postgres postgres 16777216 Mar 4 04:54 000000010000000000000014 -rw-------. 1 postgres postgres 16777216 Mar 4 05:39 000000010000000000000015 -rw-------. 1 postgres postgres 16777216 Mar 4 05:43 000000010000000000000016 -rw-------. 1 postgres postgres 305 Mar 4 05:43 000000010000000000000017.00000028.backup -rw-------. 1 postgres postgres 16777216 Mar 4 05:43 000000010000000000000017 -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000018 -rw-------. 1 postgres postgres 305 Mar 4 05:44 000000010000000000000019.00000028.backup drwxrwxr-x. 2 postgres postgres 4096 Mar 4 05:44 . -rw-------. 1 postgres postgres 16777216 Mar 4 05:44 000000010000000000000019 ltuser=> \q [postgres@pgsql-lnx1 pg_wal]$ pwd ==== >> so far 70000 rows are inserted in test table on ltuser database our backup was completed on START WAL LOCATION: 0/19000028 (file 000000010000000000000019) CHECKPOINT LOCATION: 0/19000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2019-04-04 05:44:46 CDT >>>>>> LABEL: pg_basebackup base backup ===== >>> lets simulate crash [postgres@pgsql-lnx1 pg_wal]$ ps -ef |grep post root 1256 1 0 Mar03 ? 00:00:00 /usr/libexec/postfix/master -w postfix 1284 1256 0 Mar03 ? 00:00:00 qmgr -l -t unix -u postgres 4104 1 0 Mar03 ? 00:00:02 /opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir postgres 4106 4104 0 Mar03 ? 00:00:03 postgres: checkpointer process postgres 4107 4104 0 Mar03 ? 00:00:01 postgres: writer process postgres 4108 4104 0 Mar03 ? 00:00:03 postgres: wal writer process postgres 4109 4104 0 Mar03 ? 00:00:04 postgres: autovacuum launcher process postgres 4110 4104 0 Mar03 ? 00:00:08 postgres: stats collector process postgres 4111 4104 0 Mar03 ? 00:00:00 postgres: bgworker: logical replication launcher root 12813 12760 0 04:37 pts/3 00:00:00 su - postgres postgres 12814 12813 0 04:37 pts/3 00:00:00 -bash postfix 14203 1256 0 05:26 ? 00:00:00 pickup -l -t unix -u postgres 14546 1 0 05:39 pts/3 00:00:00 /opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir_clone postgres 14548 14546 0 05:39 ? 00:00:00 postgres: checkpointer process postgres 14549 14546 0 05:39 ? 00:00:00 postgres: writer process postgres 14550 14546 0 05:39 ? 00:00:00 postgres: wal writer process postgres 14551 14546 0 05:39 ? 00:00:00 postgres: autovacuum launcher process postgres 14552 14546 0 05:39 ? 00:00:00 postgres: archiver process last was 000000010000000000000019 postgres 14553 14546 0 05:39 ? 00:00:00 postgres: stats collector process postgres 14554 14546 0 05:39 ? 00:00:00 postgres: bgworker: logical replication launcher root 15199 15135 0 06:00 pts/2 00:00:00 su - postgres postgres 15200 15199 0 06:00 pts/2 00:00:00 -bash postgres 15892 15200 0 06:26 pts/2 00:00:00 ps -ef postgres 15893 15200 0 06:26 pts/2 00:00:00 grep --color=auto post [postgres@pgsql-lnx1 pg_wal]$ kill -9 14546 ===== >>>> remove files under data directory [postgres@pgsql-lnx1 datadir_clone]$ rm -rf * [postgres@pgsql-lnx1 datadir_clone]$ ls -lart total 4 drwxr-xr-x. 9 postgres postgres 4096 Mar 4 04:41 .. drwx------. 2 postgres postgres 6 Mar 4 06:28 . ===== >>>> try to start database [postgres@pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start pg_ctl: directory "/pgdata/datadir_clone" is not a database cluster directory [postgres@pgsql-lnx1 datadir_clone]$ pwd /pgdata/datadir_clone [postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444 psql: could not connect to server: Connection refused Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5444"? ===== >>> let's restore from backup [postgres@pgsql-lnx1 datadir_clone]$ cd /pgbkup/backup/fsbackup/online/ [postgres@pgsql-lnx1 online]$ ls backup_label global pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf postgresql.conf.orig base logfile pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf [postgres@pgsql-lnx1 online]$ cp -rp * /pgdata/datadir_clone ==== >>> create recovery command file under data directory cat recovery.conf restore_command = 'cp /pgdata/archive/5444/%f %p' ==== >>> start recovery [postgres@pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start waiting for server to start....2019-04-04 06:30:40.174 CDT [16043] LOG: listening on IPv6 address "::1", port 5444 2019-04-04 06:30:40.174 CDT [16043] LOG: listening on IPv4 address "127.0.0.1", port 5444 2019-04-04 06:30:40.177 CDT [16043] LOG: listening on Unix socket "/tmp/.s.PGSQL.5444" 2019-04-04 06:30:40.202 CDT [16044] LOG: database system was interrupted; last known up at 2019-04-04 05:44:46 CDT 2019-04-04 06:30:40.490 CDT [16044] LOG: starting archive recovery >>>>>> ****** 2019-04-04 06:30:40.555 CDT [16044] LOG: restored log file "000000010000000000000019" from archive 2019-04-04 06:30:40.609 CDT [16044] LOG: redo starts at 0/19000028 2019-04-04 06:30:40.612 CDT [16044] LOG: consistent recovery state reached at 0/19000130 2019-04-04 06:30:40.613 CDT [16043] LOG: database system is ready to accept read only connections cp: cannot stat ‘/pgdata/archive/5444/00000001000000000000001A’: No such file or directory 2019-04-04 06:30:40.621 CDT [16044] LOG: redo done at 0/19000130 2019-04-04 06:30:40.662 CDT [16044] LOG: restored log file "000000010000000000000019" from archive >>>.**** done server started >>>> ***** [postgres@pgsql-lnx1 datadir_clone]$ cp: cannot stat ‘/pgdata/archive/5444/00000002.history’: No such file or directory >>>*** error 2019-04-04 06:30:40.721 CDT [16044] LOG: selected new timeline ID: 2 2019-04-04 06:30:40.792 CDT [16044] LOG: archive recovery complete cp: cannot stat ‘/pgdata/archive/5444/00000001.history’: No such file or directory 2019-04-04 06:30:40.906 CDT [16043] LOG: database system is ready to accept connections >>>> *** opened database ===== >>> let's validate [postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444 psql (10.6) Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+-----------+----------+-------------+-------------+------------------------------ fal1dbd | fal1admin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | fal1admin=CTc/fal1admin + | | | | | =T/fal1admin + | | | | | visa_system_role=c/fal1admin fal2dbd | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres + | | | | | =T/postgres + | | | | | visa_system_role=c/postgres fal3dbd | fal3admin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | fal3admin=CTc/fal3admin + | | | | | =T/fal3admin ltuser | ltuser | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres + | | | | | =T/postgres + | | | | | visa_system_role=c/postgres template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres + | | | | | =c/postgres (7 rows) postgres=# \c ltuser ltuser You are now connected to database "ltuser" as user "ltuser". ltuser=> \dt List of relations Schema | Name | Type | Owner --------+------------+-------+-------- public | customer | table | ltuser public | district | table | ltuser public | history | table | ltuser public | item | table | ltuser public | new_order | table | ltuser public | order_line | table | ltuser public | orders | table | ltuser public | stock | table | ltuser public | warehouse | table | ltuser (9 rows) ltuser=> ltuser=> exit ===== >>>> validation results since we lost every thing, it only performed recovery till last available log in hotbackup. since we lost every thing, it only performed recovery till last available log in hotbackup. latest data in current wal log is lost and there we have test table. lost of latest wal file/log will give some data loss, need to find a way to have redundant copy on latest wall logfile -- *Regards,#! Pavan Kumar----------------------------------------------*- *Sr. Database Administrator..!* *NEXT GENERATION PROFESSIONALS, LLC* *Cell # 267-799-3182 # pavan.dba27 (Gtalk) * *India # 9000459083* *Take Risks; if you win, you will be very happy. If you lose you will be Wise *