Hi,

I have tested this great feature in the latest commit environment on Red Hat 
Enterprise Linux 7.8. I modified a few blocks in a relation file to raise a 
checksum error. When I executed the pg_relation_check_pages function, the 
backend terminated abnormally. The attached file is the operation log.

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Wednesday, October 28, 2020 2:09 PM
To: Julien Rouhaud <rjuju...@gmail.com>
Cc: Justin Pryzby <pry...@telsasoft.com>; Masahiko Sawada 
<masahiko.saw...@2ndquadrant.com>; Robert Haas <robertmh...@gmail.com>; 
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Masahiko Sawada 
<sawada.m...@gmail.com>
Subject: Re: Online checksums verification in the backend

On Tue, Oct 27, 2020 at 07:47:19PM +0800, Julien Rouhaud wrote:
> I think it's also worth noting that the IOLock is now acquired just 
> before getting the buffer state, and released after the read (or after 
> finding that the buffer is dirty).  This is consistent with how it's 
> done elsewhere, so I'm fine.

Consistency is the point.  This API should be safe to use by design.
I have done some extra performance tests similar to what I did upthread, and 
this version showed similar numbers.

> Other than that I'm quite happy with the changes you made, thanks a lot!

Thanks for confirming.  I have gone through the whole set today, splitted the 
thing into two commits and applied them.  We had buildfarm member florican 
complain about a mistake in one of the
GetDatum() calls that I took care of already, and there is nothing else on my 
radar.
--
Michael
$ uname -a
Linux rel78-1 3.10.0-1127.el7.x86_64 #1 SMP Tue Feb 18 16:39:12 EST 2020 x86_64 
x86_64 x86_64 GNU/Linux
$
$ pg_config
BINDIR = /usr/local/pgsql/bin
DOCDIR = /usr/local/pgsql/share/doc
HTMLDIR = /usr/local/pgsql/share/doc
INCLUDEDIR = /usr/local/pgsql/include
PKGINCLUDEDIR = /usr/local/pgsql/include
INCLUDEDIR-SERVER = /usr/local/pgsql/include/server
LIBDIR = /usr/local/pgsql/lib
PKGLIBDIR = /usr/local/pgsql/lib
LOCALEDIR = /usr/local/pgsql/share/locale
MANDIR = /usr/local/pgsql/share/man
SHAREDIR = /usr/local/pgsql/share
SYSCONFDIR = /usr/local/pgsql/etc
PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE =  '--enable-debug' '--enable-cassert'
CC = gcc -std=gnu99
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -O2
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lz -lreadline -lrt -ldl -lm
VERSION = PostgreSQL 14devel
$
$ initdb --no-locale --encoding=utf8 -k data.1
The files belonging to this database system will be owned by user "postgres".
...
Success. You can now start the database server using:

    pg_ctl -D data.1 -l logfile start
$
$ pg_ctl -D data.1 start
waiting for server to start....2020-10-29 08:14:10.772 JST [106109] LOG:  
redirecting log output to logging collector process
2020-10-29 08:14:10.772 JST [106109] HINT:  Future log output will appear in 
directory "log".
 done
server started
$
$ psql
psql (14devel)
Type "help" for help.

postgres=# SELECT pg_control_system();
                       pg_control_system
---------------------------------------------------------------
 (1300,202010281,6888812156269485571,"2020-10-29 08:09:44+09")
(1 row)
postgres=#
postgres=# CREATE TABLE data1(c1 NUMERIC, c2 VARCHAR(10));
CREATE TABLE
postgres=#
postgres=# INSERT INTO data1 VALUES (generate_series(1, 10000), 'data1');
INSERT 0 10000
postgres=#
postgres=# SELECT pg_relation_filepath('data1');
 pg_relation_filepath
----------------------
 base/13751/16384
(1 row)
postgres=# \q
$
$ pg_ctl -D data.1 stop
waiting for server to shut down.... done
$
--- Modify base/13751/16384 file for checksum error
$
$ pg_ctl -D data.1 start
waiting for server to start....2020-10-29 08:15:31.868 JST [106144] LOG:  
redirecting log output to logging collector process
2020-10-29 08:15:31.868 JST [106144] HINT:  Future log output will appear in 
directory "log".
 done
server started
$
$ psql
psql (14devel)
Type "help" for help.

postgres=# SELECT * FROM data1;
WARNING:  page verification failed, calculated checksum 31945 but expected 35621
ERROR:  invalid page in block 0 of relation base/13751/16384
postgres=#
postgres=# SELECT pg_relation_check_pages('data1', 'main');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q

$
$ cat data.1/log/postgresql-2020-10-29_081531.log
2020-10-29 08:15:31.868 JST [106144] LOG:  starting PostgreSQL 14devel on 
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 
64-bit
2020-10-29 08:15:31.868 JST [106144] LOG:  listening on IPv6 address "::1", 
port 5432
2020-10-29 08:15:31.868 JST [106144] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2020-10-29 08:15:31.870 JST [106144] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2020-10-29 08:15:31.871 JST [106146] LOG:  database system was shut down at 
2020-10-29 08:14:46 JST
2020-10-29 08:15:31.872 JST [106144] LOG:  database system is ready to accept 
connections
2020-10-29 08:15:44.315 JST [106154] WARNING:  page verification failed, 
calculated checksum 31945 but expected 35621
2020-10-29 08:15:44.315 JST [106154] ERROR:  invalid page in block 0 of 
relation base/13751/16384
2020-10-29 08:15:44.315 JST [106154] STATEMENT:  SELECT * FROM data1;
TRAP: BadArgument("MemoryContextIsValid(context)", File: 
"../../../../src/include/utils/memutils.h", Line: 130, PID: 106154)
postgres: postgres postgres [local] SELECT(ExceptionalCondition+0x7a)[0x901c9a]
postgres: postgres postgres [local] SELECT(pfree+0x55)[0x92ded5]
postgres: postgres postgres [local] 
SELECT(pg_relation_check_pages+0x252)[0x887b42]
postgres: postgres postgres [local] 
SELECT(ExecMakeFunctionResultSet+0xf8)[0x6680e8]
postgres: postgres postgres [local] SELECT[0x68e6ab]
postgres: postgres postgres [local] SELECT[0x68e7e5]
postgres: postgres postgres [local] SELECT(standard_ExecutorRun+0x142)[0x65e052]
postgres: postgres postgres [local] SELECT[0x7e4a6b]
postgres: postgres postgres [local] SELECT(PortalRun+0x278)[0x7e5fb8]
postgres: postgres postgres [local] SELECT[0x7e1794]
postgres: postgres postgres [local] SELECT(PostgresMain+0xef6)[0x7e2a16]
postgres: postgres postgres [local] SELECT[0x484a38]
postgres: postgres postgres [local] SELECT(PostmasterMain+0x1145)[0x759305]
postgres: postgres postgres [local] SELECT(main+0x45e)[0x48619e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6794118555]
postgres: postgres postgres [local] SELECT[0x486205]
2020-10-29 08:16:10.905 JST [106144] LOG:  server process (PID 106154) was 
terminated by signal 6: Aborted
2020-10-29 08:16:10.905 JST [106144] DETAIL:  Failed process was running: 
SELECT pg_relation_check_pages('data1', 'main');
2020-10-29 08:16:10.905 JST [106144] LOG:  terminating any other active server 
processes
2020-10-29 08:16:10.906 JST [106157] FATAL:  the database system is in recovery 
mode
2020-10-29 08:16:10.908 JST [106144] LOG:  all server processes terminated; 
reinitializing
2020-10-29 08:16:10.922 JST [106160] LOG:  database system was interrupted; 
last known up at 2020-10-29 08:15:31 JST
2020-10-29 08:16:10.926 JST [106160] LOG:  database system was not properly 
shut down; automatic recovery in progress
2020-10-29 08:16:10.927 JST [106160] LOG:  redo starts at 0/1901D10
2020-10-29 08:16:10.927 JST [106160] LOG:  invalid record length at 0/1904308: 
wanted 24, got 0
2020-10-29 08:16:10.927 JST [106160] LOG:  redo done at 0/19042D0 system usage: 
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2020-10-29 08:16:10.930 JST [106144] LOG:  database system is ready to accept 
connections
$

Reply via email to