Re: Re: transaction lost when delete clog file after normal shutdown

章晨曦@易景科技 Mon, 23 Dec 2024 01:12:53 -0800

Thanks tom.


But what I think is we may provide a better experience.&nbsp; Consider the 
below example:


[jet@halodev-jet-01 data]$ psqlpsql (16.6)
Type "help" for help.


postgres=# CREATE TABLE a_test (n INT);
CREATE TABLE
postgres=# INSERT INTO a_test VALUES (1);
INSERT 0 1
postgres=# 2024-12-23 16:56:11.023 CST [1356476] FATAL: &nbsp;terminating 
connection due to unexpected postmaster exit


postgres=# 
postgres=# \q
[jet@halodev-jet-01 data]$ 


### Here we simulate crash and clog file corrupt (delete the clog file).


[jet@halodev-jet-01 data]$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2024-12-23 16:57:24.036 CST [1356495] LOG: 
&nbsp;starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2024-12-23 16:57:24.036 CST [1356495] LOG: &nbsp;listening on IPv6 address 
"::1", port 5432
2024-12-23 16:57:24.036 CST [1356495] LOG: &nbsp;listening on IPv4 address 
"127.0.0.1", port 5432
2024-12-23 16:57:24.046 CST [1356495] LOG: &nbsp;listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2024-12-23 16:57:24.055 CST [1356498] LOG: &nbsp;database system was 
interrupted; last known up at 2024-12-23 16:54:56 CST
2024-12-23 16:57:24.147 CST [1356498] LOG: &nbsp;database system was not 
properly shut down; automatic recovery in progress
2024-12-23 16:57:24.151 CST [1356498] LOG: &nbsp;redo starts at 0/14E4D20
2024-12-23 16:57:24.152 CST [1356498] LOG: &nbsp;file "pg_xact/0000" doesn't 
exist, reading as zeroes
2024-12-23 16:57:24.152 CST [1356498] CONTEXT: &nbsp;WAL redo at 0/14FCAB0 for 
Transaction/COMMIT: 2024-12-23 16:55:13.531244+08; inval msgs: catcache 80 
catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 
16384
2024-12-23 16:57:24.152 CST [1356498] LOG: &nbsp;invalid record length at 
0/14FCD20: expected at least 24, got 0
2024-12-23 16:57:24.152 CST [1356498] LOG: &nbsp;redo done at 0/14FCCE8 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-12-23 16:57:24.157 CST [1356496] LOG: &nbsp;checkpoint starting: 
end-of-recovery immediate wait
2024-12-23 16:57:24.184 CST [1356496] LOG: &nbsp;checkpoint complete: wrote 27 
buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, 
sync=0.014 s, total=0.030 s; sync files=22, longest=0.006 s, average=0.001 s; 
distance=96 kB, estimate=96 kB; lsn=0/14FCD20, redo lsn=0/14FCD20
2024-12-23 16:57:24.188 CST [1356495] LOG: &nbsp;database system is ready to 
accept connections
&nbsp;done
server started
[jet@halodev-jet-01 data]$ psql
psql (16.6)
Type "help" for help.


postgres=# SELECT * FROM a_test;
&nbsp;n 
---
&nbsp;1
(1 row)


postgres=# \q


We can see that when database restart, it will try to recover. So I think we 
may improve database reliable in some scenarios if just clog 
file&nbsp;corrupted.


Regards,
Jet


                    Tom Lane<t...@sss.pgh.pa.us&gt;&nbsp;在 2024年12月23日 周一 14:50 
写道：

"=?utf-8?B?56ug5pmo5pumQOaYk+aZr+enkeaKgA==?=" <zhangche...@halodbtech.com&gt; 
writes:
&gt; And after a while, a system error occurred and&amp;nbsp;unfortunately, 
just caused clog file corrupted.&amp;nbsp;&amp;nbsp;
&gt; So we need to restore the database from backup just because of the tiny 
clog file corrupted.

I'm not seeing a large difference between this complaint
and whining because Unix doesn't have a way to recover from
"sudo rm -rf /".&nbsp; clog is critical data: if you mess with
it you will destroy your database.&nbsp; It is not the only
critical data in the system, either.

&gt; Is there any chance to improve this?

We're not in the business of building doubly- or triply-redundant
storage.&nbsp; The cost/benefit just isn't attractive for very many people.
If you don't trust your hardware, you can put your storage on RAID,
or replicate the database, etc.&nbsp; If you have a DBA who thinks it's
cool to remove files they don't understand the purpose of, the answer
is to fire that DBA.

                        regards, tom lane

Re: Re: transaction lost when delete clog file after normal shutdown

Reply via email to