Re: pgsql: Validate page level checksums in base backups

2018-04-15 Thread Magnus Hagander
On Tue, Apr 10, 2018 at 11:45 PM, David Steele wrote: > On 4/10/18 5:24 PM, Tomas Vondra wrote: > >> >> I think there's a bug in sendFile(). We do check checksums on all pages >> that pass this LSN check: >> >> /* >> * Only check pages which have not been modified since the >> *

Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Michael Paquier
On Tue, Apr 10, 2018 at 11:35:21PM +0200, Tomas Vondra wrote: > BTW pg_verify_checksums needs the same fix. Yes you are right here. Just for the record: what needs to be done is to check for PageIsNew() in scan_file() before fetching pg_checksum_page() and after reading an individual block, which

Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread David Steele
On 4/10/18 5:24 PM, Tomas Vondra wrote: I think there's a bug in sendFile(). We do check checksums on all pages that pass this LSN check: /* * Only check pages which have not been modified since the * start of the base backup. Otherwise, they might have been * written onl

Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Tomas Vondra
On 04/10/2018 11:24 PM, Tomas Vondra wrote: > Hi, > > I think there's a bug in sendFile(). We do check checksums on all pages > that pass this LSN check: > > /* > * Only check pages which have not been modified since the > * start of the base backup. Otherwise, they might have bee

Re: pgsql: Validate page level checksums in base backups

2018-04-10 Thread Tomas Vondra
Hi, I think there's a bug in sendFile(). We do check checksums on all pages that pass this LSN check: /* * Only check pages which have not been modified since the * start of the base backup. Otherwise, they might have been * written only halfway and the checksum would not be va

Re: pgsql: Validate page level checksums in base backups

2018-04-06 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 2:41 PM, Michael Banck wrote: > Hi, > > On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote: > > On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck > > > wrote: > > > Otherwise, I had a quick look and there is no obvious outlier; the > > > pgdata is 220 MB after the

Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Alvaro Herrera
Michael Banck wrote: > Hi, > > Do we have a precedent somewhere for how we do this, or does our test > > framework already have a way to do it? How are all the actual data > > directories etc cleaned up? > > They (and the base backups) are getting purged on success of the whole > testsuite. So to

Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Michael Banck
Hi, On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote: > On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck > wrote: > > Otherwise, I had a quick look and there is no obvious outlier; the > > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that > > could be cut down som

Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck wrote: > Hi, > > On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote: > > On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera > > wrote: > > > Michael Banck wrote: > > > > > > > However, the pg_basebackup testsuite takes up 800+ MB to run, > >

Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Michael Banck
Hi, On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote: > On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera > wrote: > > Michael Banck wrote: > > > > > However, the pg_basebackup testsuite takes up 800+ MB to run, > > > > Uh, you're right. This seems a bit over the top. Can we reduce

Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera wrote: > Michael Banck wrote: > > > However, the pg_basebackup testsuite takes up 800+ MB to run, > > Uh, you're right. This seems a bit over the top. Can we reduce that > without losing coverage? We've gone great lengths to avoid large > amounts

Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Alvaro Herrera
Michael Banck wrote: > However, the pg_basebackup testsuite takes up 800+ MB to run, Uh, you're right. This seems a bit over the top. Can we reduce that without losing coverage? We've gone great lengths to avoid large amounts of data in tests elsewhere. -- Álvaro Herrerahttps

Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Michael Banck
Hi, On Wed, Apr 04, 2018 at 11:38:35AM +0200, Magnus Hagander wrote: > On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck > wrote: > > > Hi, > > > > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote: > > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane wrote: > > > I'd bet a good lunch that

Re: pgsql: Validate page level checksums in base backups

2018-04-04 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck wrote: > Hi, > > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote: > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane wrote: > > I'd bet a good lunch that nondefault BLCKSZ would break it, as well, > > > since the way in which the corruptio

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread David Steele
On 4/3/18 4:48 PM, Michael Banck wrote: Attached is a patch which does that hopefully: 1. creates two user tables, one large enough for at least 6 blocks (around 360kb), the other just one block. 2. stops the cluster before scribbling over its data and starts it afterwards. 3. uses the blocks

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Michael Banck
Hi, On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote: > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane wrote: > I'd bet a good lunch that nondefault BLCKSZ would break it, as well, > > since the way in which the corruption is induced is just guessing > > as to where page boundaries are.

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander writes: > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane wrote: >> It's scribbling on the source cluster's disk files and assuming that that >> translates one-for-one to what gets sent to the slave server --- but what >> if some of the blocks that it modifies on-disk are resident in the

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane wrote: > Magnus Hagander writes: > > Yeah, there's clearly a second problem here. > > I think this test script is broken in many ways. > > It's scribbling on the source cluster's disk files and assuming that that > translates one-for-one to what gets sent

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Peter Geoghegan
On Tue, Apr 3, 2018 at 11:29 AM, Tom Lane wrote: > Also, scribbling on tables as sensitive as pg_class is just asking for > trouble IMO. I don't see anything in this test, for example, that > prevents autovacuum from running and causing a PANIC before the test > can complete. +1 > Even with AV

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Tom Lane
Magnus Hagander writes: > Yeah, there's clearly a second problem here. I think this test script is broken in many ways. It's scribbling on the source cluster's disk files and assuming that that translates one-for-one to what gets sent to the slave server --- but what if some of the blocks that i

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Magnus Hagander
On Tue, Apr 3, 2018 at 7:13 PM, Andres Freund wrote: > On 2018-04-03 11:52:26 +, Magnus Hagander wrote: > > Validate page level checksums in base backups > > > > When base backups are run over the replication protocol (for example > > using pg_basebackup), verify the checksums of all data blo

Re: pgsql: Validate page level checksums in base backups

2018-04-03 Thread Andres Freund
On 2018-04-03 11:52:26 +, Magnus Hagander wrote: > Validate page level checksums in base backups > > When base backups are run over the replication protocol (for example > using pg_basebackup), verify the checksums of all data blocks if > checksums are enabled. If checksum failures are encount